1 Introduction

Sugarcane is extremely important as a commodity nowadays, mainly because of the by-products it offers; sugar, biofuel (ethanol), bagasse, vinasse and organic fertilizers, among others. Figures for the 2022/2023 world harvest indicate production of around 183 million tons, an increase of 1.5% compared to the previous year (United States Department, 2023). This demand for sugarcane by-products is expected to increase as various branches of human activity, such as industry and agriculture, continue searching for cleaner and more sustainable energy sources.

According to the literature, such as Agarwal (2007); Chu and Majumdar (2012) and Farias et al. (2021), most of the electricity consumed by humanity is obtained from fossil and other non-renewable sources. This dependence has brought concerns about their depletion and the pollution caused during their transformation and use. Currently, the economic development and resumption of global industrial and logistical processes are at increasing levels. This has accelerated industrialization even more and motivated the scientific community to research renewable and clean sources of energy, especially those that reduce the emission of carbon dioxide (CO\(_2\)) into the atmosphere (Santos, 2022). Moreover, according to the study in Usman and Balsalobre-Lorente (2022), the exploration and use of natural resources and renewable energies significantly mitigate environmental pollution over time.

The study in Matsuoka et al. (2016) highlights that the generation of electricity using sugarcane biomass allows sugar-energy mills to guarantee their self-sufficiency in electricity during the harvest period and to sell the surplus. However, high concentrations of sucrose and low fiber content prevail in the commercial varieties of sugarcane available on the market. Therefore, a new type of cane is under development, called energy-cane, aiming at the energy generation market. According to Matsuoka et al. (2017) and Matsuoka and Rubio (2019), energy-cane has a higher percentage of fiber in its composition, resulting in a cane with greater biomass production capacity. Moreover, it is more resistant to pests and easily adapted to the soil. With the introduction of this new cane variety, the objective of production planning in mills is to increase productivity in the cane fields, considering the sucrose content (for sugar and ethanol production) and the fiber concentration (for energy production).

In the current scenario of the sugar-energy sector and the complexity of the operations involved in the cultivation and processing of sugarcane, several studies have been proposed using mathematical optimization techniques. Considering the previous work developed in Florentino et al. (2020); Poltroniere et al. (2021); Aliano et al. (2022) and Aliano et al. (2023), this study proposes a bi-objective model that integrates the planting and harvesting decisions made about the different varieties of sucrose cane and energy-cane considering various harvest cycles. The model aims to maximize sucrose and fiber production volumes and, simultaneously, to minimize operational costs. Thus, the main contributions and differences in this paper, considering previous related work, are:

  1. a.

    A new model deals with these aspects: (i) optimization of two conflicting objectives such as production volumes and cost; (ii) dealing with multiple harvest cycles (cuts); (iii) choosing the varieties that, depending on the planting month, will have cycles of 12 or 18 months; and (iv) considering the production of sucrose and fiber from the two types of cane, sucrose cane and energy-cane, generalizing and completing gaps left by previous models. The model proposes new expressions to deal with sucrose cane and energy-cane production, including new cuts (harvests) being considered for the harvesting of different varieties in their respective plots.

  2. b.

    A new MIP-heuristic is proposed based on relax-and-fix and fix-and-optimize (RF &FO) principles with exact decomposition, coupled to an exact scalarization technique, capable of determining excellent approximations to the Pareto optimal solutions (verified through evaluation of the duality gap) in a reasonable computational effort.

This paper is organized as follows: Sect. 2 presents a brief literature review of some studies of mathematical programming applied to the sugarcane supply chain and recent developments of MIP-heuristics in integer problems. Section 3 describes the features of the problem and the proposed mathematical formulation. Section 4 describes the Tchebycheff metric to find different Pareto optimal solutions to this problem. In Sect. 5, a MIP-heuristic approach is proposed. The computational results are presented in Sect. 6, divided into two subsections. Subsection 6.1 shows, discusses and compares the different efficient solutions, providing a schedule for planting and harvesting for instances based on real data practiced in the sugar mills. Subsection 6.2 performs a series of additional computational tests to certify the quality and effectiveness of the proposed MIP-heuristic on semi-random instances. Finally, Sect. 7 presents the conclusions of this paper and directions for further research.

2 Literature review

Our literature review includes two distinct subsections. Subsection 2.1 focuses on papers aimed at applying mathematical programming to modeling problems in the sugarcane production supply chain. Subsection 2.2 highlights some studies dealing with integer programming models solved with MIP-heuristics.

2.1 Optimization in sugarcane supply chain planning

The sugarcane supply chain has received much attention in operations research in the last 25 years due to its complexity, size and importance to the world economy. Complex decisions involving all the sectors in this chain must be taken at the operational, tactical and strategic levels. The operations involve planting, crop maintenance, harvesting, machine and harvester operation, transportation and milling/refining. From this perspective, models and mathematical optimization methods to support decisions have been developed, especially in countries like Brazil, India, Thailand, and Australia, the world’s largest producers of this crop.

The article in Muchow et al. (1998) proposed an approach using an optimization model applied to a mill in the Mossman region (Australia). The objective was to maximize the sugar yield and net income and the decision was to choose the harvest period. The work in Calija et al. (2001) optimized the system of selecting cane varieties to be grown using clones employing stochastic simulation models combined with dynamic programming. The study in Higgins and Muchow (2003) applied operations research techniques taking advantage of geographical and climatic differences for improved sucrose yields in Australia. From a tactical and strategic planning perspective, the authors in Higgins et al. (2004) reduced production costs in the sugarcane production chain by enhancing efficiency and integrating the harvest and transport phases of this chain. The study in Higgins and Postma (2004) again addressed transportation in different models (rail and road) to reduce costs. In addition, other socio-economic issues of this production chain were considered, including human labor. Another case study in Australia (in Higgins (2006)) proposed a mixed integer linear programming (MILP). The decisions were to reduce queue times at mills, downtime and the number of transport vehicles to optimize capital and costs. The work in Milan et al. (2006) proposed an operational-level MILP to solve a cost-minimization problem for sugarcane harvesting and transportation.

The paper in Kaewtrakulpong et al. (2008) was one of the first to use multi-objective optimization to allocate the harvest machines and trucks to reduce costs. The case study was conducted in Thailand, looking at the different objectives of the stakeholders: farmers, truck and machine owners and the mills. Authors in Pathumnakul and Nakrachata-Amon (2015) focused on operations research techniques for optimizing cane harvest in the same country. The work was concerned with the routing of harvesting machines, the joining of fields and labor force integration. The paper in Lamsal et al. (2017) coordinated the arrival of cut cane at the mills through a logistic coordination model. The objective was to minimize the loss of sucrose content using a MILP. Moreover, valid inequalities and constructive heuristics to obtain feasible initial solutions were proposed. Authors in Kaab et al. (2019) used a multi-objective genetic algorithm and data envelopment analysis to reduce environmental impact and energy use in sugarcane plantations in Iran. The main scope of the study in Pornprakun et al. (2019) was to determine optimal harvesting policies for two sugarcane types in Thailand (fresh and burned – based on sugar content) to maximize revenue and minimize harvesting costs. Sugarcane bagasse was treated economically and environmentally in Varshney et al. (2019) and applied in India. The material was used to produce electricity, ethanol and pallets. The decisions involved using the different forms of bagasse collected from farms with three objective functions: maximizing Net Present Value, minimizing greenhouse gas emissions and minimizing water use. Others recent studies applying a multi-objective approach in the sugarcane supply chain as are Qiu et al. (2023) (sucrose extraction for evaluating the performance and energy efficiency) and Pongpat et al. (2023) (utilization of sugarcane-based products considering several sustainability indices).

Specifically in Brazil, a variety of studies have been undertaken. The tactical and operational planning of sugarcane harvesting to maximize the sugar production volume using a MILP were proposed in Jena and Poggi (2013). The study in Florentino and Pato (2014) addressed the bi-objective problem of selecting sugarcane varieties, seeking to minimize the costs of residue collection and maximize the energy potential of this biomass. A goal programming model that deals with uncertainty for harvest scheduling in the sugar and ethanol industries, considering land conditions, cane-cutting decisions and agricultural logistics was developed in Silva et al. (2015). A methodology for optimal cultivation and planting with a 5-year planning horizon was investigated in Ramos et al. (2016). The decisions included choosing varieties to maximize sucrose yield. A new solution approach to the multi-objective model proposed in Florentino and Pato (2014) was presented in Lima et al. (2017). A hybrid method was proposed, combining the predictor-corrector primal-dual interior-point and the branch-and-bound methods. The authors in Santoro et al. (2017) were concerned with the costs of mechanized harvesting. Thus, a mathematical model of harvest route planning aimed at minimizing the machine relocating time was proposed. A multi-objective goal programming model to optimize harvesting decisions was proposed in Florentino et al. (2018). The model considers the age of the sugarcane planted in each plot. The first objective aimed to minimize harvest deviations from maturation peak and the second one was the displacement of the harvesting machinery. Harvest fronts were also addressed in Junqueira and Morabito (2019) but with a single objective. The authors proposed MIP-heuristics for determining competitive solutions in practice. In Aliano et al. (2021), exact methods for solving a bi-objective integer linear optimization problem were tested and implemented. The decisions were focused on harvesting and transporting operations, whose objectives were the minimization of total cost and time for harvesting each plot.

The authors in Florentino et al. (2020) proposed a mathematical model for obtaining a multi-period schedule for the planting and harvesting of sugarcane based on hybrid metaheuristics. In this same perspective, the paper in Poltroniere et al. (2021) developed a new mixed integer linear optimization model for planning the planting and harvesting of sugarcane with the concept of energy-cane (destined to produce only dry mass). A heuristic based on the relax-and-fix and fix-and-optimize techniques was proposed for solving large-scale problems. Other relevant research developed an integrated tri-objective model with single harvest in Aliano et al. (2022). The model also dealt with sucrose cane and energy-cane varieties. The objectives were to maximize production volumes, minimize the harvesting fronts and minimize the transportation costs of harvesting machines. In their model, operational and tactical decisions that chose the type of harvester and the number of hours worked were made. Recently, the authors in Aliano et al. (2023) proposed a three-objective optimization model focused only on the cane harvest (single cut). The model used the concept of degree-days to measure cane maturation, not as a function of cultivation time, as in previous studies. The objectives were: to harvest the cane as close as possible to its accumulated degree-days; reduce the number of harvest fronts; and minimize costs associated to the transporting of the harvesting equipment.

Table 1 summarizes the main studies that use mathematical programming in sugarcane, referenced in this subsection. The columns in this table consider the country of application, if the model is single or multi-objective, the planning dimension regarding the number of periods, the by-products of interest (when this is the case), the main decisions and the solution methodology for each model. The gaps left by these 25 papers in this area can be seen, both in terms of modeling and resolution methodology. Out of the 12 studies that consider multiple objectives, only four are multi-period (cuts). None of them consider sucrose cane and energy-cane varieties for fiber and sucrose production. On the other hand, studies considering the two by-products do not consider multiple objectives, multiple cuts and decisions on the planting and harvesting simultaneously. Therefore, this study presents an original solution approach adapted to address these questions.

Table 1 Comparison of the main related works focused on mathematical programming applied in sugarcane supply chain planning

2.2 MIP-heuristics in some mixed-integer linear problems

MIP-heuristics and, more precisely the relax-and-fix and fix-and-optimize (RF &FO) procedure, have been applied to problems with integer variables. The better commercial solvers have difficulty in solving some \(\mathcal{N}\mathcal{P}\)-hard problems for large instances. From this perspective, RF &FO procedures have been widely used to deal with complex large-size problems. In the proposed approach, the use of mathematical modeling and commercial integer programming solvers to optimize smaller integer subproblems effectively is explored. The result is approximate solutions of good quality and moderate computational effort. Studies applied to scheduling and lot sizing problems such as Beraldi et al. (2008), Ferreira et al. (2009), Toso et al. (2009), Akartunali and Miller (2009), Helber and Sahling (2010) and James and Almada-Lobo (2011) demonstrated the validity of this strategy.

Specifically, Beraldi et al. (2008) developed rolling-horizon and fix-and-relax heuristics for the parallel machine lot-sizing and scheduling problem, sequence-dependent and with setup costs. Their computational results showed that the maximum gap between the best heuristic solution and the lower bound provided by the truncated branch-and-bound was 3%. A multi-level capacitated lot sizing problem was solved in Helber and Sahling (2010). The authors proposed an easy-to-implement algorithm, fast, flexible and accurate. Its solution quality outperformed those reported in the literature so far. The technique in a model that integrated production lot sizing decisions in beverage manufacturing plants with sequence-dependent costs and setup times were used in Ferreira et al. (2009). Solutions were obtained and proved to be better than those executed in practice. Lot sizing and multi-level production planning problems were solved by RF &FO in Akartunali and Miller (2009). A mixed method combining metaheuristics with RF &FO for lot sizing and production sequencing problem were introduced in James and Almada-Lobo (2011). The authors developed a constructive and improvement procedure to produce competitive solutions in real dimensional instances where solvers failed. A MIP-heuristic to solve an energy-saving planning problem, aware of manufacturing process demands was used in Bruzzone et al. (2012). The paper in Shirvani et al. (2014) dealt with cyclic scheduling problems in the food industry environment. An algorithm based on a MIP-heuristic with an iterated greedy algorithm was developed to generate good and feasible solutions. The wholesale facility locations in food supply chain systems were studied in Etemadnia et al. (2015) on a national scale to transfer food from production regions to consumption locations. A MIP-heuristic based on the relaxation of the problem, eliminating constraints and fixing variables was used to solve large real instances. The problem of determining a purchasing plan was considered in Cárdenas-Barrón et al. (2021) to satisfy the requirements of multiple items over a planning horizon, with multiple suppliers available for purchasing. The authors developed valid inequalities for formulating this problem coupled with a MIP-heuristic that outperforms other algorithms developed for this problem. A capacitated three-level lot-sizing and replenishment problem was studied in Cunha et al. (2022). A hybrid procedure based on relax-and-fix to generate an initial feasible solution followed by a fix-and-optimize improvement procedure was implemented to obtain high-quality solutions.

In the sugarcane context, the research published in Junqueira and Morabito (2019) proposed a MIP-heuristic to deal with the scheduling of harvesting fronts for a mill. The solutions produced were validated and confirmed by consulting experts. The application of RF &FO principles in the study Poltroniere et al. (2021), with a single objective and single harvest, showed high-quality solutions. The results revealed close-to-optimal solutions (gap less than 1%) in real-dimension instances with short computational times. These results motivated the adaptation and implementation of RF &FO in the multi-period and multi-objective problem proposed in this paper.

3 Problem description and mathematical modeling

In this section, we first present key-concepts in Subsection relating to the sugarcane supply chain. Subsequently, Subsection 3.2 illustrates mechanisms for calculating the approximate productivity of sucrose and fiber from sucrose cane and energy-cane. Finally, the mathematical model is discussed in Subsection 3.3.

3.1 Definitions and assumptions

The cane is cultivated in a minimum cultivation unit called a plot, which receives a single variety of sugarcane. In the center-south region of Brazil, the life cycle of sugarcane is, approximately, 18 months (called year-and-a-half cane) or 12 months (called year cane). This cycle duration depends on the planting time. The year-and-a-half cane is ready for the first cut 18 months later if planting occurs from January to April. The year cane system is ready for the first cut if planting occurs in September and October. After the first cut, without the need for new planting, the regrowth cane can be cut again after 12 months, regardless of its planting season or the cycle duration cycle into the first cut (Matsuoka et al., 2016; Matsuoka et al., 2017). According to Cheavegatti-Gianotto et al. (2011), Brazilian sugarcane fields are harvested on average four times.

Each variety of sucrose cane and energy cane has a certain yield of sucrose and fiber. The period in which the higher sucrose productivity occurs is called maturation peak. The difference between the harvest period and the period of maximum sucrose is defined as deviation from maturation peak (d). For sucrose cane varieties, the ideal period for harvesting is when \(d=0\). On the other hand, the level of fiber for energy cane varieties tends to increase gradually after the maturation peak (when \(d>0\)) up to a certain limit.

The planning of sugarcane planting and harvesting also needs to meet practical constraints imposed by the mill, such as: (i) a given variety must be cultivated in a limited number of plots and (ii) produce a minimum amount of sucrose and fiber in each month. Considering these conditions, the cane harvested in each field may not necessarily occur in the ideal period, leading to nontrivial decisions at both the planting and harvesting times. Based on the previous description, the sugarcane supply chain problem addressed in this study has two main decisions:

  1. (i)

    to choose which varieties, when, and in which plots they will be planted;

  2. (ii)

    to decide with what deviation d from the ideal month each plot will be harvested in each season (cut).

The objectives of the planning consist of maximizing sucrose and fiber production volumes and, simultaneously, minimizing the operational cost of planting, cultivation, harvesting (including the displacement of machines) and transportation during the planning period.

3.2 Functions to estimate sucrose and fiber productivity

As previously described, in our model, sucrose and fiber are both obtained from sucrose cane and energy-cane. To facilitate the distinction between these parameters, we use the superscripts \(^s\) and \(^e\) for the parameters associated with sucrose cane and energy-cane, respectively.

Define as \(\alpha ^s\) the annual rate of decrease in sucrose cane variety production. Let \(\alpha ^e\) be the annual increase rate production of energy-cane variety. Therefore, sucrose cane and energy-cane have productivity correction factors given by \((1-\alpha ^s)\) and \((1+\alpha ^e)\), respectively, updated at each new cut c. In addition, the sucrose productivity (both in sucrose cane and energy-cane varieties) depends on the deviation d from the maturation peak. Authors in Nervis (2015) proposed a productivity quadratic correction factor given by \(-0.0243d^2+1\). Inspired in Poltroniere et al. (2021), to estimate fiber productivity, a linear correction factor as a function of d is given by \(0.0041d+1\) (for both sucrose cane and energy-cane varieties).

The expression in (1) provides an estimate for the sucrose production (\(\gamma _{ijcd}^s\), in tons) of the sucrose cane variety i, planted in plot j and harvested in cut c with deviation d from its ideal month.

$$\begin{aligned} \gamma _{ijcd}^s = (-0.0243d^2+1) \cdot (1-\alpha ^s)^{(c-1)} \cdot \zeta _i^s \cdot \rho _i^s \cdot \ell _j, \end{aligned}$$
(1)

where \(\zeta _i^s\) is the sucrose percentage, \(\rho _i^s\) is the productivity (ton h\(^{-1}\)), and \(\ell _j\) is the area of plot (in ha).

The sucrose production (\(\gamma _{ijcd}^e\), in tons) of the energy-cane variety is given by Eq. (2).

$$\begin{aligned} \gamma _{ijcd}^e = (-0.0243d^2+1) \cdot (1+ \alpha ^e)^{(c-1)} \cdot \zeta _i^e \cdot \rho _i^e \cdot \ell _j, \end{aligned}$$
(2)

where \(\zeta _i^e\) is the sucrose percentage in energy-cane variety and \(\rho _i^e\) represents the productivity (ton h\(^{-1}\)).

To estimate fiber production (\(\theta _{ijcd}^s\), in tons) of the sucrose cane, Eq. (3) is used:

$$\begin{aligned} \theta _{ijcd}^s = (0.0041d + 1) \cdot (1 - \alpha ^s)^{(c-1)} \cdot \omega _i^s \cdot \rho _i^s \cdot \ell _j, \end{aligned}$$
(3)

where \(\omega _i^s\) represents the fiber percentage.

Finally, the fiber production (\(\theta _{ijcd}^e\), in tons) of the energy-cane variety is calculated by Eq. (4).

$$\begin{aligned} \theta _{ijcd}^e = (0.0041d + 1) \cdot (1 + \alpha ^e)^{(c-1)} \cdot \omega _i^e \cdot \rho _i^e \cdot \ell _j, \end{aligned}$$
(4)

where \(\omega _i^s\) is the fiber percentage in energy-cane variety.

3.3 A bi-objective binary formulation

To describe the model, we present the indices, parameters, sets and variables used in the proposed mathematical model. The sets and parameters associated with sucrose cane and energy-cane are defined separately, as both are cultivated in previously dedicated plots. That is, the sets of varieties and plots dedicated to planting are exclusive to each one.

Indices

i

associated with sucrose cane and energy-cane varieties;

j

associated with the plots;

p

associated with the planting month;

h

associated with the harvest month;

c

associated with the cuts;

d

associated with deviations from the ideal harvesting month.

Parameters

\(\ell _j\)

area of plot j (ha);

\(\kappa ^s\)

number of plots intended for sucrose cane planting;

\(\kappa ^e\)

number of plots intended for energy-cane planting;

\(n^s\)

number of sucrose cane varieties;

\(n^e\)

number of energy-cane varieties;

\(\eta \)

maximum percentage of cultivated plots with the same variety;

\(\alpha ^s\)

rate of decrease in sucrose cane productivity at each new cut;

\(\alpha ^e\)

rate of increase in energy-cane productivity with each new cut;

\(\rho _i^s\)

productivity of sucrose cane variety i (ton ha\(^{-1}\));

\(\rho _i^e\)

productivity of energy-cane variety i (ton ha\(^{-1}\));

\(\zeta _i^s\)

sucrose percentage contained in sucrose cane variety i;

\(\zeta _i^e\)

sucrose percentage contained in energy-cane variety i;

\(\omega _i^s\)

fiber percentage contained in sucrose cane variety i;

\(\omega _i^e\)

fiber percentage contained in energy-cane variety i;

\(\gamma _{ijcd}^s\)

production (in tons) of sucrose from the sucrose cane, considering variety i in plot j in cut c and with deviation d;

\(\gamma _{ijcd}^e\)

production (in tons) of sucrose from the energy-cane, considering variety i in plot j in cut c and with deviation d;

\(\theta _{ijcd}^s\)

production (in tons) of fiber from the sucrose cane, considering variety i in plot j in cut c and with deviation d;

\(\theta _{ijcd}^e\)

production (in tons) of fiber from the energy-cane, considering variety i in plot j in cut c and with deviation d;

\(\beta _{jc}^s\)

cultivation, harvesting and transportation operational cost of sucrose cane planted in plot j in each cut c (the first cut also includes the cost of planting the sucrose cane variety) (R$ \(\cdot \) ton\(^{-1}\));

\(\beta _{jc}^e\)

cultivation, harvesting and transportation operational cost of energy-cane planted in plot j in each cut c (the first cut also includes the cost of planting the energy-cane variety) (R$ \(\cdot \) ton\(^{-1}\));

\(\tau _{ijc}^s\)

where \(\tau _{ijc}^s = \beta _{jc}^s \cdot (1 - \alpha ^s)^{(c-1)} \cdot \rho _i^s \cdot \ell _j\) is the cost (in R$) for the planting (only for the first cut), cultivation and harvesting of variety i of sucrose cane in plot j in cut c;

\(\tau _{ijc}^e\)

where \(\tau _{ijc}^e = \beta _{jc}^e \cdot (1 + \alpha ^e)^{(c-1)} \cdot \rho _i^e \cdot \ell _j\) is the cost (in R$) for the planting (only for the first cut), cultivation and harvesting of variety i of energy-cane in plot j in cut c;

\(\pi _{ijc}^s\)

where \(\pi _{ijc}^s = (1 - \alpha ^s)^{(c-1)} \cdot \rho _i^s \cdot \ell _j\) is the milling production (in tons) for the planting (only for the first cut), cultivation and harvesting of variety i of sucrose cane in plot j in cut c;

\(\pi _{ijc}^e\)

where \(\pi _{ijc}^e = (1 + \alpha ^e)^{(c-1)} \cdot \rho _i^e \cdot \ell _j\) is the milling production (in tons) for the planting (only for the first cut), cultivation and harvesting of variety i of energy-cane in plot j in cut c;

\(\sigma _{mc}^s\)

sucrose demand in month h of the cut c (tons);

\(\sigma _{mc}^e\)

fiber demand in month h of the cut c (tons);

\(\delta _{mc}\)

milling capacity in month h of the cut c (tons).

Sets

\(V^s\)

set of sucrose cane varieties, where \(V^s\) = \(\{1,\ldots ,n^s\}\);

\(V^e\)

set of energy-cane varieties, where \(V^e\) = \(\{n^s + 1,\ldots ,n^s + n^e\}\);

\(J^s\)

set of plots for sucrose cane planting, where \(J^s = \{1,\ldots ,\kappa ^s\}\);

\(J^e\)

set of plots intended for energy-cane planting, where \(J^e\) = \(\{\kappa ^s + 1,\ldots ,\kappa ^s + \kappa ^e\}\);

V

set of all varieties available (\(V = V^s \cup V^e\));

J

set of all plots available (\(J = J^s \cup J^e\));

\(T_P\)

set of the feasible planting month;

\(T_H\)

set of the feasible harvest month;

C

set of the cane cut periods;

D

set of possible deviations from the ideal month of sucrose cane and energy-cane.

Decision and auxiliary variables

The model has two groups of decision variables defined below: those related to planting (\(x_{ijp}\)) and those related to harvesting (\(t_{ijchd}\)). The \(y_j\) variables are auxiliary and used in the first cut. Once the planting is done in a given month in the plot j, the duration of the cane life cycle in this plot is already defined: 12 months (year cane) or 18 months (year-and-a-half cane).

\(x_{ijp}\) =\(\left\{ \begin{array}{ll} 1, & \text {if variety } \textit{i} \text { will be planted in plot } \textit{j} \text { in month } \textit{p} ,\\ 0, & \text {otherwise,} \end{array}\right. \)

for all \(i \in V, j \in J\) and \(p \in T_P\).

\(y_{j}\) =\(\left\{ \begin{array}{lll} 1, & \text {if plot } \textit{j} \text { is cultivated with a year cane,}\quad \\ 0, & \text {if plot } \textit{j} \text { is cultivated with a year-and-half-cane,} \end{array}\right. \)

for all \(j \in J\).

\(t_{ijchd}\) =\(\left\{ \begin{array}{lll} 1, & \text {if variety } \textit{i} \text { planted in plot } \textit{j} \text { will be harvested in month } \textit{h} \text { of the}\quad \\ & \text {cut period } \textit{c} , \text { with deviation } \textit{d} \text { from the ideal month,}\quad \\ 0, & \text {otherwise,} \end{array}\right. \)

for all \(i \in V, j \in J, c \in C,h \in T_H\) and \(d \in D\).

Bi-objective mathematical model:

$$\begin{aligned} \hspace{-0.5cm}\text{ max } v_1 =&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \gamma _{ijcd}^s \cdot t_{ijchd} + \sum _{j \in J^e}\sum _{i \in V^e}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \theta _{ijcd}^e \cdot t_{ijchd} +\nonumber \\&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \theta _{ijcd}^s \cdot t_{ijchd} + \sum _{j \in J^e}\sum _{i \in V^e}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \gamma _{ijcd}^e\cdot t_{ijchd} \end{aligned}$$
(5)
$$\begin{aligned} \text{ min } v_2 =&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \tau _{ijc}^s \cdot t_{ijchd} + \sum _{j \in J^e}\sum _{i \in V^e}\sum _{c \in C}\sum _{h \in T_H}\sum _{d \in D} \tau _{ijc}^e \cdot t_{ijchd} \end{aligned}$$
(6)
$$\begin{aligned} \text{ subject } \text{ to } \nonumber \\&\sum _{i \in V}\sum _{p \in T_P} x_{ijp} = 1, & j \in J \end{aligned}$$
(7)
$$\begin{aligned}&\sum _{j \in J^s}\sum _{p \in T_P} x_{ijp} \le \eta \cdot \kappa ^s, & i \in V^s \end{aligned}$$
(8)
$$\begin{aligned}&\sum _{j \in J^e}\sum _{p \in T_P} x_{ijp} \le \eta \cdot \kappa ^e, & i \in V^e \end{aligned}$$
(9)
$$\begin{aligned}&\sum _{i \in V}\sum _{p \in T_P} p \cdot x_{ijp} \le 4+ 6 \cdot y_j, & j \in J \end{aligned}$$
(10)
$$\begin{aligned}&\sum _{i \in V}\sum _{p \in T_P} p \cdot x_{ijp} \ge 9 \cdot y_j, & j \in J \end{aligned}$$
(11)
$$\begin{aligned}&\sum _{i \in V} \sum _{p \in T_P} i \cdot x_{ijp} = \sum _{i \in V}\sum _{h \in T_H}\sum _{d \in D} i \cdot t_{ijchd}, \quad & c \in C, \, j \in J \end{aligned}$$
(12)
$$\begin{aligned}&\sum _{i \in V}\sum _{h \in T_H}\sum _{d \in D} t_{ijchd} = 1, & c \in C, \, j \in J \end{aligned}$$
(13)
$$\begin{aligned}&\sum _{i \in V}\sum _{p \in T_P} p \cdot x_{ijp} = \sum _{i \in V} \sum _{h \in T_H} \sum _{d \in D} (h-d) \cdot t_{ij1hd}, \quad & j \in J \end{aligned}$$
(14)
$$\begin{aligned}&\sum _{i \in V}\sum _{h \in T_H} \sum _{d \in D} (h-d) \cdot t_{ijchd} & \nonumber \\&\hspace{3cm} = \sum _{i \in V}\sum _{h \in T_H} \sum _{d \in D} (h-d) \cdot t_{ij(c-1)hd}, \quad & c \in C \setminus \{1\}, \, j \in J \end{aligned}$$
(15)
$$\begin{aligned}&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{d \in D} \gamma _{ijcd}^s \cdot t_{ijchd} & \nonumber \\&\hspace{3cm} +\sum _{j \in J^e}\sum _{i \in V^e}\sum _{d \in D} \gamma _{ijcd}^e \cdot t_{ijchd}\ge \sigma _{mc}^s, & c \in C, \, h \in T_H \end{aligned}$$
(16)
$$\begin{aligned}&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{d \in D} \theta _{ijcd}^s \cdot t_{ijchd} & \nonumber \\&\hspace{3cm} + \sum _{j \in J^e}\sum _{i \in V^e}\sum _{d \in D} \theta _{ijcd}^e \cdot t_{ijchd}\ge \sigma _{mc}^e, & c \in C, \, h \in T_H \end{aligned}$$
(17)
$$\begin{aligned}&\sum _{j \in J^s}\sum _{i \in V^s}\sum _{d \in D} \pi _{ijc}^s \cdot t_{ijchd} & \nonumber \\&\hspace{3cm} + \sum _{j \in J^e}\sum _{i \in V^e}\sum _{d \in D} \pi _{ijc}^e \cdot t_{ijchd} \le \delta _{mc}, \quad & c \in C, \, h \in T_H \end{aligned}$$
(18)
$$\begin{aligned}&x_{ijp} \in \{0,1\}, \hspace{0.8cm} i \in V, \, j \in J, \, p \in T_P & \end{aligned}$$
(19)
$$\begin{aligned}&y_j \in \{0,1\}, \hspace{2.cm} j \in J & \end{aligned}$$
(20)
$$\begin{aligned}&t_{ijchd} \in \{0,1\}, \hspace{0.5cm} i \in V,\ j \in J, \, c \in C, \ h \in T_H, \ d \in D. & \end{aligned}$$
(21)

The objective in (5) aims to maximize sucrose and fiber production volumes obtained by adding the sucrose and fiber production of the varieties of sucrose cane and energy-cane in all harvests. The second objective, given in (6), minimizes the operational cost of planting, cultivation, harvesting and transportation during the planning period. The costs of planting, cultivation, harvesting, and transportation are considered for the first harvest. After that, only the costs of cultivation, harvesting and transport are considered for other harvests.

Constraints (7) guarantee planting a single variety in each plot. Constraints (8) and (9) ensure that a maximum of \(\eta \)% of the plots are reserved for each sucrose cane variety and energy-cane, respectively. This practical constraint allows multiple varieties to be grown on the farm, making the entire crop less susceptible to diseases, pests and weeds (Liebman and Dyck, 1993). Constraints (10) and (11) identify, for each plot j, whether the cane variety i must be planted in the annual or year-and-a-half system (only for the first cut). For example, if \(y_j=0\), then (11) is redundant and (10) forces that the planting month is between January and April (\(1 \le p \le 4\)); if \(y_j=1\), the same constraints force the planting month p to be between September and October (\(9 \le p \le 10\)). Constraints (12) ensure that, in each cut, the variety harvested in each plot is the same as that which was planted. Constraints (13) ensure that each cut will have a single harvest in each plot. Equations (14) determine the harvesting month in each plot for the first cut. Similarly, Eq. (15) determine the harvesting month in each plot from the second cut. The last two sets constraints assume that each plot is harvested in a maximum of one month (we assume that there are enough machines for this). Other practical constraints considered in our formulation are due to previously signed contracts and the milling capacity of the two types of cane. In this sense, Constraints (16) guarantee the fulfilment of the monthly demand for sucrose in each cut, obtained from sucrose cane and energy-cane. In similar way, Constraints (17) ensure the fulfilment of the monthly demand for fiber from sucrose cane and energy-cane for each cut. Finally, Constraints (18) aim to respect the maximum monthly milling capacity of the mill in each harvest. As previously described, the plots intended for planting sucrose cane and energy-cane are pre-defined, which is considered in the model. However, planning the planting and harvesting of both types of cane must be balanced and carried out separately since both meet the demands for sucrose and fiber. In addition, the milling capacity is shared in each harvest, which integrates the two types of cane. Thus, the mills are specific to the type of cane (sucrose cane or energy-cane) and must be prepared to receive each one. It is not possible to mix the varieties and grind them together. Finally, Constraints (19), (20) and (21) define the domains of the decision variables.

To improve the understanding of the proposed model, especially the relationship between the variables that define planting, harvesting and deviations from maturation, a didactic example with a diagram is presented in the Supplementary Material.

The multi-objective model proposed determines solutions that establish a trade-off between planting, harvesting, transportation costs and sucrose and fiber production volumes by selecting sucrose cane and energy-cane varieties. This trade-off can vary depending on the priority between the minimum and maximum of each objective, where the decision-maker can choose the alternative that best meets his interests. For cane mills, it is very important to balance these objectives and that solutions that prioritize only one objective are avoided. If the production volume is maximized, the costs of planting, harvesting and transportation should also be high since the decisions that lead to this goal are not necessarily the same ones that minimize costs. A variety of high-productivity cane is not expected to have the lowest costs and vice-versa.

Model (5)–(21) presents difficulties and complexities to be solved due to several factors. Among them, the large number of binary decision variables, even for small instances, is highlighted. In addition, the model is bi-objective, requiring more elaborate solution approaches when compared to those used for mono-objective models. Sections 4 and 5 describe the solution procedure proposed in this study to solve instances based on real data.

4 A method to obtain Pareto optimal solutions

4.1 Basic definitions

In a bi-objective optimization problem, the objective functions are conflicting. A unique solution optimizing all objectives concomitantly is impossible (or utopic, as it is known in the literature). In the bi-objective model (5)–(21), when maximizing sucrose and fiber production, the operating cost increases, while minimizing the operational cost means the production volumes of sucrose and fiber decreases because the model seeks only to produce to meet demand, not exploiting the most productive varieties and consequently increasing costs.

According to the classical references in multi-objective optimization like (Miettinen, 1999) and Ehrgott and Wiecek (2005), some definitions need to be introduced in this field. Firstly, the optimal solution concept is generalized to efficient or Pareto optimal. An efficient or Pareto optimal solution \(\textbf{s}^*\) is defined as a feasible solution such that there exists no other feasible solution \(\hat{\textbf{s}}\) that is equal or better for each objective, with at least one strictly better objective. In an efficient solution, any improvement in one objective worsens at least one other objective involved. The image of an efficient solution for the objective functions is called the non-dominated vector. The set of all non-dominated vectors constitutes the Pareto frontier of the problem. The vector whose components are the optimal values of each objective function restricted to the feasible original set is called the ideal. Note that the pre-image of this vector is an infeasible solution to the problem defined by the model (5)–(21). The end-points of the Pareto frontier are called lexicographic points and the pre-image of these points are solutions called lexicographic solutions. Lexicographic solutions are determined when \(v_i + \varepsilon \cdot v_j\) is optimized, restricted to the original feasible solution space, with \(i \ne j\) and \(\varepsilon >0\) a sufficiently small number. A non-dominated point is called supported when it is in the convex boundary of the Pareto frontier. Otherwise, it is unsupported. In problems with integer variables, the Pareto frontier may contain many unsupported points. Finally, the compromise solution is defined as the efficient solution that establishes an equal balance (in some metric) between the objectives involved. This study uses the Tchebycheff metric to determine the compromise solution.

Multi-objective optimization methods must identify different efficient solutions and provide a full insight into the trade-off between the objectives. The efficient solutions for the bi-objective model (5)–(21) proposed in this study were determined using the Augmented Tchebycheff Method. This technique transforms the bi-objective optimization problem into a set of mono-objective subproblems, optimizing one of the objective functions. At the same time, the other is inserted in the set of constraints through predefined lower and upper bounds, enabling the investigation of their optimal solution.

4.2 The Tchebycheff scalarization method

Several studies in the literature have been motivated by the Tchebycheff Metric to determine compromise solutions for an integer multi-objective problem, such as Aliano et al. (2022); Nikulin et al. (2012); Giagkiozis and Fleming (2015); García-Segura et al. (2018), and Aliano et al. (2021). In simplified form, this scalarization determines Pareto optimal solutions as close as possible to the ideal point. The distance to the ideal point is the largest weighted deviation from its coordinates and the weights are assigned by the user according to his preferences. There are other theoretical advantages to its use. The Augmented Tchebycheff subproblem, when optimized, can determine any efficient solution, no matter whether its image is a supported or unsupported non-dominated point.

Another reason for its choice is related to the proposed MIP-heuristic, presented in Sect. 5. The Tchebycheff problem does not need to modify the original feasible set of the problem (5)–(21) to determine different compromise solutions, unlike the methods inspired by \(\varepsilon \)-constrained methods. Instead, the method deals with the variation of the weights \(\lambda \) of the objective functions from the coordinates of the ideal vector that, when minimized, determine efficient solutions. Naturally, these weighted deviations are taken as additional constraints in the subproblem but do not cut or alter the original admissible set. The proposed MIP-heuristic uses this advantage because it does not have to deal with constraints imposing upper and lower bounds on the values of the objective functions. This would certainly take away the efficiency of the proposed approach. The strategy of not modifying the original feasible set is also taken advantage of in mipstart because the heuristic solution, provided by the MIP-heuristic, must always serve as the incumbent for the Tchebycheff subproblem, as will be seen in Sect. 5. Therefore, these factors were enough to indicate this method as the viable and adequate alternative for this study.

To apply this method, first determine the lexicographic solutions of the bi-objective problem. Let \(\textbf{s}\) be the feasible set for the original bi-objective problem (cf. (7)–(21)) where \(\textbf{s} \in \mathcal {S}\) contains all the decision variables for the problem. The first step is to determine the solutions \(\textbf{s}^*_1\) and \(\textbf{s}^*_2\) of Problems (22) and (23) whose values are optimal for each objective \(v_1\) and \(v_2\) individually:

$$\begin{aligned} \begin{array}{rcl} \textbf{s}^*_1 & = & \text {argmax} \{v_1(\textbf{s}) - \varepsilon \cdot v_2(\textbf{s})\} \\ \text {subject to} & & \\ & & \textbf{s} \in \mathcal {S}, \end{array} \end{aligned}$$
(22)

and

$$\begin{aligned} \begin{array}{rcl} \textbf{s}^*_2 & = & \text {argmin} \{v_2(\textbf{s}) - \varepsilon \cdot v_1(\textbf{s})\} \\ \text {subject to} & & \\ & & \textbf{s} \in \mathcal {S}. \end{array} \end{aligned}$$
(23)

where \(\varepsilon >0\) is an appropriate positive constant to eliminate alternative solutions. Define \(\textbf{v}_1^* = \left( v_1(\textbf{s}^*_1),v_2(\textbf{s}^*_1) \right) ^\top = \left( v_1^+,v_2^+ \right) ^\top \) and \(\textbf{v}_2^* = \left( v_1(\textbf{s}^*_2),v_2(\textbf{s}^*_2) \right) ^\top =\left( v_1^-,v_2^- \right) ^\top \) from Problems (22) and (23), respectively. With these problems, the minimum and maximum values for sucrose and fiber yield for Problem (22) and the minimum and maximum values for total production cost for Problem (23) are determined. Consequently, the ideal point components for the criterion space involving both problems and defined in (24) have been determined.

Next, the Weighted Augmented Tchebycheff problem is defined. Given the weight \(0< \lambda < 1\), the scalar subproblem whose optimal solution, \(\textbf{s}^*_c\), is Pareto optimal for the bi-objective original problem, is defined as follows:

$$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_\lambda & = & \text {argmin} \left\{ \max \left\{ \lambda \cdot \dfrac{v_1^+ - v_1(\textbf{s})}{v_1^+ - v_1^-}, (1-\lambda ) \cdot \dfrac{v_2(\textbf{s}) - v_2^-}{v_2^+ - v_2^-} \right\} + \varepsilon \cdot \left[ -v_1(\textbf{s}) + v_2(\textbf{s})\right] \right\} \\ \text {subject to} & & \\ & & \textbf{s} \in \mathcal {S}. \end{array} \end{aligned}$$
(24)

The objective is to determine a solution \(\textbf{s} \in \mathcal {S}\) whose maximum weighted deviation from the ideal point is minimized. Since the functions \(v_1\) and \(v_2\) have different orders of magnitude, they must be normalized by dividing them by the constants \(v_1^+ - v_1^-\) and \(v_2^+ - v_2^-\), respectively. For each choice of \(\lambda \), an efficient solution is determined. As \(\lambda \rightarrow 0\), efficient solutions with lower production volumes and cost levels are determined. On the other hand, if \(\lambda \rightarrow 1\), efficient solutions with maximum outputs and costs are achieved. In particular, the weight \(\lambda =\frac{1}{2}\) assigns the same weight to the two objectives, being, in most cases, like the model proposed in this study. The additional term \(\varepsilon \cdot \left[ v_1(\textbf{s}) + v_2(\textbf{s})\right] \) is only to prevent weakly efficient solutions.

The subproblem (24) can be rewritten in linear form as follows:

$$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_c & = & \text {argmin} \left\{ u + \varepsilon \cdot \left[ -v_1(\textbf{s}) + v_2(\textbf{s})\right] \right\} \\ \text {subject to} & & \\ & & \textbf{s} \in \mathcal {S}\\ & & \lambda \cdot \dfrac{v_1^+ - v_1(\textbf{s})}{v_1^+ - v_1^-} \le u\\ & & (1-\lambda ) \cdot \dfrac{v_2(\textbf{s}) - v_2^-}{v_2^+ - v_2^-} \le u \\ & & u \ge 0.\\ \end{array} \end{aligned}$$
(25)

Note that any \(\textbf{s} \in \mathcal {S}\) solution is also feasible for the Problem (25). This aspect will be crucial for obtaining efficient solutions to the bi-objective problem in a reasonable CPU time. Section 5 proposes a MIP-heuristic approach for determining feasible integer solutions for problems related to the original model (5)–(21) and how these solutions can assist in obtaining exact solutions for these problems.

5 A MIP-heuristic approach

This section proposes a MIP-heuristic approach for solving the bi-objective model. Inspired by previous studies, its success in various applications and the good performance of the RF &FO in combinatorial problems (especially in Poltroniere et al. (2021)), an adapted version of this approach is used to deal with the multi-period sugarcane planting and harvesting scheduling problem. Furthermore, the problem deals with conflicting objectives and, since scalarization techniques are used to solve it, there is a demand to optimize mono-objective subproblems multiple times. This further justifies the use of this heuristic approach in the present case.

As presented in the computational results section, the original model cannot be solved by exact methods using real instances in a reasonable CPU time. More specifically, preliminary computational experiments showed that obtaining a feasible planting and harvesting schedule for the \(\vert C \vert \) cycles was impossible. This is due to model characteristics such as weak linear relaxation and the high number of binary variables involved. Therefore, the proposition of the MIP-heuristic has two goals: (i) to provide approximations for the different Pareto optimal solutions to the original problem and (ii) to use these approximate solutions as incumbents for applying exact methods (such as branch-and-bound) using the Gurobi software. The result is a combined approach that can generate approximate and exact Pareto optimal solutions for the problem under consideration. The advantage of this procedure is that it can generate good quality approximations in a reasonable CPU time while, at the same time, the quality of these approximations is evaluated.

The core of the MIP-heuristic is to decompose the original problem over harvest cycles and sequentially determine the harvest schedule using fixed harvest variables determined in the previous year. The MIP-heuristic has its distinct operation for each of the two types of solutions in determining: the lexicographic solutions (\(\textbf{s}_1^*\) and \(\textbf{s}_2^*\)) and the efficient ones \((\textbf{s}_{\lambda }^*)\) as \(\lambda \) is chosen.

The procedure described in this section is combined with an exact optimization strategy to find a way to evaluate the quality of the approximate solutions. Moreover, the exact method uses the approximate (incumbent) solutions to refine and search for more promising solutions. The incumbent solutions generated by the proposed approach accelerate the convergence process to optimal solution because all the computational effort is devoted to pruning the branches of the search tree. Preliminary computational tests illustrated that branch-and-bound could not find feasible integer solutions to this problem after many hours of execution.

The stages of the MIP-heuristic for determining approximate solutions (Steps 1 to 2) and exact solution (Step 3) for the lexicographic solutions to Problem (5)–(21) are defined as follows:

  1. 1.

    Step 1. Determining the first-year planting and harvesting for the approximate lexicographic solutions. The variables \(0 \le t_{ijchd} \le 1\) for all \(i \in V\), \(j \in J\) for all \(c \in C \setminus \{1\}\), \(h \in T_H\), \(d \in D\) are relaxed and the variables \(t_{ij1hd}\) are kept as binary, i.e., the interest is in determining a planting and harvesting schedule only for the first year (\(c=1\)). Consider for the year \(c=1\), the following demands for sucrose \(\overline{\sigma }_m^s\) and fiber \(\overline{\sigma }_m^e\) are determined as follows:

    $$\begin{aligned} \overline{\sigma }_m^s = \max _{c \in C} \{ \sigma _{mc}^s \text { for each } h \in T_H \}, \end{aligned}$$
    (26)

    and

    $$\begin{aligned} \overline{\sigma }_m^e = \max _{c \in C} \{ \sigma _{mc}^e \text { for each } h \in T_H \}. \end{aligned}$$
    (27)

    This is done to guarantee planting that meets the demands for the remaining years. Then, consider the following constraints:

    $$\begin{aligned} \begin{array}{llll} & \text {Constraints } (7)-(11) \text { and } (14)\\ & \displaystyle \sum _{i \in V} \sum _{p \in T_P} i \cdot x_{ijp} = \sum _{i \in V}\sum _{h \in T_H}\sum _{d \in D} i \cdot t_{ij1hd}, \quad & & j \in J \\ & \displaystyle \sum _{i \in V}\sum _{h \in T_H}\sum _{d \in D} i \cdot t_{ij1hd} = 1, & & j \in J \\ & \text {Constraints } (16) \text { for } c=1, \text {RHS equal to } \overline{\sigma }_m^s\\ & \text {Constraints } (17) \text { for } c=1, \text {RHS equal to } \overline{\sigma }^e_m\\ & \text {Constraints } (18) \text { for } c=1 \\ & x_{ijp} \in \{0,1\}, & & i \in V, \, j \in J, \, p \in T_P \\ & y_j \in \{0,1\}, & & j \in J \\ & t_{ij1hd} \in \{0,1\}, & & i \in V,\ j \in J,\ h \in T_H, \ d \in D. \end{array} \end{aligned}$$
    (28)

    Constraints (28) define the same conditions as constraints (7)–(21) particularized for \(\bar{c}=1\), except for constraints (15) that connect one cycle to the other. Define as \(\mathcal {S}_1\) the space of variables \(x_{ijp}\), \(y_j\) and \(t_{ij1hd}\) generated by the constraints (28) and \(\textbf{s}_1 \in \mathcal {S}_1\).

    1. (a)

      Step 1a: Determination of the approximate maximum production for the first cut. To maximize the production of the first cut, solving the Problem (29):

      $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{1,1} & = & \text {argmax} \left\{ v_1(\textbf{s}_1) - \varepsilon \cdot v_2(\textbf{s}_1) \right\} \\ \text {subject to} & & \\ & & \textbf{s}_1 \in \mathcal {S}_1. \end{array} \end{aligned}$$
      (29)

      The optimal vector \(\textbf{s}^*_{1,1}\) contains all the variables \(x_{ijp}^*\), \(y_{j}^*\) and \(t_{ij1hd}^*\) that optimize production for the first cut.

    2. (b)

      Step 1b: Determination of the approximate minimum cost for the first cut. To minimize the harvesting cost in the first cut, Problem (30) is solved:

      $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{1,2} & = & \text {argmin} \left\{ v_2(\textbf{s}_1) - \varepsilon \cdot v_1(\textbf{s}_1)\right\} \\ \text {subject to} & & \\ & & \textbf{s}_1 \in \mathcal {S}_1. \end{array} \end{aligned}$$
      (30)

      The optimal vector \(\textbf{s}^*_{1,2}\) contains all the variables \(x_{ijp}^*\), \(y_{j}^*\) and \(t_{ij1hd}^*\) that minimize the cost for the first cut.

  2. 2.

    Step 2: Determination of an approximate harvesting for subsequent years. For each \(\bar{c} \in \{2,3,\ldots ,\vert C \vert \}\), fix the harvesting variables of the year \(\bar{c}-1\) (\(t_{ij(\bar{c}-1)hd}\) at \(t_{ij(\bar{c}-1)hd}^*\)), to determine the harvesting for the year \(\bar{c}\) by optimizing only the variables \(t_{ij\bar{c}hd}\). In particular, for \(\bar{c}=2\), \(t^*_{ij(\bar{c}-1)hd}\) are optimal variables provided by either Problem (29) or (30) from the first year harvest. The subproblem to be resolved, for each \(\bar{c} \in C {\setminus } \{1\}\), will have the following sets of constraints:

    $$\begin{aligned}&t_{ij(\bar{c}-1)} = t^*_{ij(\bar{c}-1)hd}, & i \in V,\, j \in J, \ h \in T_H, \, d \in D \end{aligned}$$
    (31)
    $$\begin{aligned}&\text {Constraints } (13)-(18) \text { for } c=\bar{c} \end{aligned}$$
    (32)
    $$\begin{aligned}&t_{ij\bar{c}hd} \in \{0,1\}, & i \in V, \, j \in J, \, h \in T_H,\ d \in D. \end{aligned}$$
    (33)

    Constraints (31) fix the harvest cycle variables for the year \(\bar{c}-1\). The conditions (32) impose a unique harvest for each plot j, forcing the same variety harvested in year \(\bar{c}-1\) to be harvested in year \(\bar{c}\), establishing the relationship between the harvesting month of the previous year and the following one, satisfying the sucrose and fiber demand for the current year and imposing the milling capacity limit. Finally, (33) defines the decision variables’ domain. Define as \(\mathcal {S}_{\bar{c}}\) the space of variables \(t_{iji\bar{c}hd}\) generated by the constraints (31)–(33) and \(\textbf{s}_{\bar{c}} \in \mathcal {S}_{\bar{c}}\) for each \(\bar{c} \in \{2,3,\ldots ,\vert C \vert \}\). Note that the planting variables (\(x_{ijp}\)) and that choosing the life cycle (\(y_j\)) are not considered in this set of constraints because they have already been determined in Step 1. The procedure is done sequentially; that is, the optimization in cycle \(\bar{c}=1\) provides the variables to form the constraint set \(\mathcal {S}_{2}\), the optimization in cycle \(\bar{c}=2\) provides the variables to form the constraint set of \(\mathcal {S}_3\) and so on.

    1. (a)

      Step 2a: Determining the maximum production approximation for subsequent cuts. To maximize the production from the first cut, Problem (34) is solved:

      $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{\bar{c},1} & = & \text {argmax} \left\{ v_1(\textbf{s}_{\bar{c}}) - \varepsilon \cdot v_2(\textbf{s}_{\bar{c}}) \right\} \\ \text {subject to} & & \\ & & \textbf{s}_{\bar{c}} \in \mathcal {S}_{\bar{c}}. \end{array} \end{aligned}$$
      (34)

      The optimal vector \(\textbf{s}^*_{\bar{c},1}\) contains the optimal variables \(t_{ij\bar{c}hd}^*\) that determine the harvest schedule in year \(\bar{c}\) for maximum production volumes.

    2. (b)

      Step 2b: Determination of the minimum cost approximation for subsequent cuts. To minimize the cost from the first cut, Problem (35) is solved:

      $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{\bar{c},2} & = & \text {argmin} \left\{ v_2(\textbf{s}_{\bar{c}}) - \varepsilon \cdot v_1(\textbf{s}_{\bar{c}})\right\} \\ \text {subject to} & & \\ & & \textbf{s}_{\bar{c}} \in \mathcal {S}_{\bar{c}}. \end{array} \end{aligned}$$
      (35)

      Note that Problems (29) and (35) are easier to solve than the original problem with all the cuts, as well as the problems in Step 1. The optimal vector \(\textbf{s}^*_{\bar{c},1}\) contains the optimal variables \(t_{ij\bar{c}hd}^*\) that determine the minimum cost for year \(\bar{c}\).

    Although the solutions \(\textbf{s}^*_{c,1}\) and \(\textbf{s}^*_{c,2}\) are optimal for each \(c \in C\), they are approximations to the solution of the original problem where planting is done concerning all \(\vert C \vert \) harvest cycles. After applying Step 1 and Step 2, the approximations to the lexicographic solutions of the bi-objective problem have been determined. Define the approximation for \(\textbf{s}_1^*\) (of maximum production volumes) as \(\displaystyle \textbf{s}_1^A = \bigcup _{c=1}^C \left\{ \textbf{s}^*_{c, 1} \right\} \) and the approximation for the minimum cost of the solution is defined by \(\displaystyle \textbf{s}_2^A = \bigcup _{c=1}^C \left\{ \textbf{s}^*_{c,2} \right\} \). The approximations for the lexicographic points of the Pareto frontier are given by \(\textbf{v}_1^A = \left( v_1(\textbf{s}^A_1), v_2(\textbf{s}^A_1) \right) ^\top = \left( v_1^A,v_2^A \right) ^\top \) and \(\textbf{v}_2^A = \left( v_1(\textbf{s}^A_2),v_2(\textbf{s}^A_2) \right) ^\top = \left( v_1^A,v_2^A \right) ^\top \).

  3. 3.

    Step 3: determining exact lexicographic solutions (Mipstart). This step involves solving the mono-objective Problems (22) and (23) using the exact branch-and-bound algorithm for all \(\vert C \vert \) cuts. The approximate lexicographic solutions, \(\textbf{s}_1^A\) and \(\textbf{s}_2^A\), obtained in Step 1 and 2 are used as incumbents for the branch-and-bound tree to determine \(\textbf{s}_1^*\) and \(\textbf{s}_2^*\), respectively. Immediately, the components of the ideal vector \(\textbf{I}=\left( v_1^+,v_2^- \right) ^\top \) have been determined. Define the optimal values for the harvest variables of the two exact lexicographic solutions (maximum production and minimum cost, respectively) as \(t_{ijchd}^{*,1}\) and \(t_{ijchd}^{*,2}\). Furthermore, let the objective values of \(v_1\) and \(v_2\) of these solutions for each cycle be calculated as follows:

    $$\begin{aligned} \begin{array}{rcl} v_{c,1,1}^* & = & v_1(t_{ijchd}^{*,1}), \quad \text {for each } c \in C \\ v_{c,2,1}^* & = & v_2(t_{ijchd}^{*,1}), \quad \text {for each } c \in C \end{array} \end{aligned}$$
    (36)

    and

    $$\begin{aligned} \begin{array}{rcl} v_{c,1,2}^* & = & v_1(t_{ijchd}^{*,2}), \quad \text {for each } c \in C \\ v_{c,2,2}^* & = & v_2(t_{ijchd}^{*,2}), \quad \text {for each } c \in C. \end{array} \end{aligned}$$
    (37)

    These values will be the reference points for the application of the MIP-heuristic for calculating approximate solutions (Steps 4 and 5) and an exact solution (Step 6) for the efficient solutions, made for each harvest \(\bar{c}\), are considered in the next steps.

  4. 4.

    Step 4: determining the approximate efficient solution for the first year for the planting and harvesting. For each \(0< \lambda < 1\) assigned, a planting and harvesting schedule for the first year (\(c=1\)) can be determined by establishing a efficient between production volumes and cost via the Augmented Tchebycheff subproblem. The aspiration point to be approximated is \(\textbf{I}_1 = \left( v_{1,1,1}^*,v_{1,2,2}^* \right) ^\top \) obtained using (36) and (37), i.e., the highest production and the lowest cost values for year \(c=1\). Then this is solved by the Augmented Tchebycheff problem (presented in its linearized form):

    $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{1,\lambda } & = & \text {argmin} \left\{ u + \varepsilon \cdot \left[ -v_1(\textbf{s}_1) + v_2(\textbf{s}_1)\right] \right\} \\ \text {subject to} & & \\ & & \textbf{s}_1 \in \mathcal {S}_1\\ & & \displaystyle \lambda \cdot \frac{v_{1,1,1}^* - v_1(\textbf{s}_1)}{v_{1,1,1}^* - v_{1,1,2}^*} \le u\\ & & \displaystyle (1-\lambda ) \cdot \frac{v_2(\textbf{s}_1) - v_{1,2,2}^*}{v_{1,2,1}^* - v_{1,2,2}^*} \le u \\ & & u \ge 0.\\ \end{array} \end{aligned}$$
    (38)

    Note that, the feasible space of Problems (29), (30) and (38) (cf. Step 1a) are the same. The solution \(\textbf{s}^*_{1,\lambda }\) contains the harvesting variables \(t_{ij1hd}^{A,\lambda }\) of the approximate efficient solution for the first cut.

  5. 5.

    Step 5: determining approximations for efficient solutions to be harvested in subsequent years. The remaining harvests are determined sequentially from the scheduling of the first cut, as was done in Step 2. However, the Augmented Tchebycheff subproblem for each cycle needs to be solved, regarding the optimal point \(\textbf{I}_{\bar{c}} = \left( v_{{\bar{c}},1,1}^*,v_{{\bar{c}},2,2}^* \right) ^\top \) (cf. (36) and (37)), that is, solve Problem (39) for \(\bar{c} \in \{2,3,\ldots ,\vert C \vert \}\):

    $$\begin{aligned} \begin{array}{rcl} \displaystyle \textbf{s}^*_{\bar{c},\lambda } & = & \text {argmin} \left\{ u + \varepsilon \cdot \left[ -v_1(\textbf{s}_{\bar{c}}) + v_2(\textbf{s}_{\bar{c}})\right] \right\} \\ \text {subject to} & & \\ & & \textbf{s}_{\bar{c}} \in \mathcal {S}_{\bar{c}}\\ & & \displaystyle \lambda \cdot \frac{v_{1,1,1}^* - v_1(\textbf{s}_{\bar{c}})}{v_{\bar{c},1,1}^* - v_{\bar{c},1,2}^*} \le u\\ & & \displaystyle (1-\lambda ) \cdot \frac{v_2(\textbf{s}_{\bar{c}}) - v_{{\bar{c}},2,2}^*}{v_{{\bar{c}},2,1}^* - v_{{\bar{c}},2,2}^*} \le u \\ & & u \ge 0.\\ \end{array} \end{aligned}$$
    (39)

    Note that the feasible space for this subproblem is the same as for the subproblems (34) and (35) (cf. Steps 2a and 2b). After determining the \(\vert C \vert \) harvest cycles, a feasible efficient and approximate solution to the original problem, \(\displaystyle \textbf{s}_\lambda ^A = \bigcup _{c=1}^C \left\{ \textbf{s}^*_{c,\lambda } \right\} \), is obtained. It is important to highlight that the added constraints (that weigh and limit the deviations of the objective functions from the components of the ideal vector) do not eliminate any integer solutions considering the original problem. Minimizing \(u+\varepsilon (-v_1+v_2)\) forces the model to find approximations to the efficient solutions whose image is in the intermediate portions of the Pareto frontier. Furthermore, the feasibility of these subproblems is always guaranteed, since conditions (26) and (27) imposed in the first cut prevent infeasibility in the subsequent years.

  6. 6.

    Step 6: determining exact efficient solutions (Mipstart). Efficient solutions for the \(\vert C \vert \) cuts, using an exact method, are determined by introducing the solution \(\textbf{s}^A_{\bar{c},\lambda }\) (obtained heuristically) in the branch-and-bound tree. Problem (24) is then solved. The branch-and-bound tree starts from an incumbent solution, and all computational effort is devoted to obtaining the optimal solution \(\textbf{s}_\lambda ^*\) of this problem.

This combined technique generates \(2(n+2)\) subproblems, where n is the number of efficient solutions that vary according to the choice of \(0< \lambda < 1\), the information provided a priori by the manager. Half of these solutions are approximate solutions, which are used by the exact methods to produce the remaining solutions. The Flowchart presented in Fig. 1 illustrates the different steps of the MIP-heuristic, highlighting the alternation between the heuristic and exact approaches and the dependence of one on the other. This diagram differentiates the exact methods implemented by the commercial solvers used to improve the heuristic solutions using “Exact\(^L\)” and “Exact\(^C\)”, emphasizing that they produce the Lexicographical and Compromise exact solutions.

Fig. 1
figure 1

Flowchart of MIP-heuristic approach combined with exact method

The following section presents the computational experiments to validate the proposed mathematical model and its solution approach.

6 Computational experiments

The mathematical model and solution approach proposed in this study were tested in two phases. In the first, presented in Sect. 6.1, the parameters considered are inspired by practical data provided by sugarcane mills in the state of São Paulo, Brazil. Five different problems involving instances reflecting the different sizes of mills are solved. The focus is on evaluating the efficient solutions found and some metrics are employed to compare the strategic planning obtained. Then, in Sect. 6.2, computational tests with instances generated semi-randomly were performed to evaluate the potential of the proposed approach.

6.1 Instances inspired by real-life cases

This section presents and discusses the computational results using the proposed solution heuristic with real case data from sugarcane mills in the state of São Paulo, Brazil. The optimization model was solved using the Gurobi software (Gurobi Optimization, 2022), whose solver implements the exact method for branch-and-bound and branch-and-cut algorithms. The implementation of the model, as well as the proposed MIP-heuristic, were developed in the Julia programming environment, version 1.0.4 (Bezanson et al., 2017), using the JuMP modeling language version v0.20 (Dunning et al., 2017). The seed for the random generator of the uniform distribution was set to “2022”. All codes and data are available in the repository whose address is https://github.com/angeloaliano/multiperiod_sugarcane. For the stopping criteria for the solver, a maximum time of two hours was set to optimize the scalar subproblems in all six steps or when the Gap was less than 0.50%. Although some solutions were not solved optimally (Gap = 0.0%) because the maximum Gap allowed for solving was attained, the terms “Efficient Solution” or “Pareto optimal” are used for the determined schedules. The computational tests were done on a computer with an Intel Core i7-2450 M processor, 2.50GHz and 8.0 GB of RAM, and a 64-bit operating system. In all experiments, \(\varepsilon =10^{-5}\) was adopted.

The computational tests were carried out for four cuts (harvests) (\(C=\{1,2,3,4\}\), index c) in a sugar-energy mill that obtains sucrose and fiber from energy-cane and sucrose cane farms located in the state of São Paulo (south-central region) in Brazil. Five different semi-random instances inspired by the reality of the mills with \(\vert J \vert \) equal to 30, 65, 150, 300 and 500 plots were considered (index j). The total number of plots for each these instances is an attempt to reflect the reality of the different sizes, capacities and economic power of mills in this region. A maximum number of 500 plots was used because this is an approximate full size at which a larger mill can manage its cane fields simultaneously. Furthermore, the more concentrated planning facilitates the targeting of resources on the formation of harvesting fronts in a limited coverage area. In fact, the mills can cultivate more than 2000 plots but, in such cases, the strategic planning of planting and harvesting is carried out in a sectorized manner, subdividing it into a few macro-regions and involving groups of nearby farms. It is important to emphasize that the operational costs of planting, cultivation, harvesting (including displacement of machines) and transportation \(\beta ^e_{jc}\) and \(\beta ^s_{jc}\) (inspired by Bigaton et al. (2017)) are integrated values. The costs consider an average distance of 25 km between the plot (cultivated area) and the mill for their calculations. The other data necessary for these experiments are detailed in Tables located in Supplementary Material of this paper.

Steps 1–6 determine, in total, six solutions for each instance, three being approximate (\(\textbf{s}_1^A\), \(\textbf{s}_2^A\) and \(\textbf{s}_{0.5}^A\)) and three exact (\(\textbf{s}_1^*\), \(\textbf{s}_2^*\) and \(\textbf{s}_{0.5}^*\)). The solutions with index 1 maximize the production volumes and those with index 2 minimize the cost. The discussion will analyze the quality of heuristic solutions compared to exact ones and the trade-off between the objectives.

Preliminary tests showed that generating integer feasible solutions for this problem using only exact methods (without introducing auxiliary methods to generate incumbent solutions) was impossible considering a limit of two hours of CPU time even for instances from \(\vert J \vert =65\) plots, regardless of the objective to be optimized. Table 2 shows in columns 3 and 4, respectively, the CPU time elapsed with the application of branch-and-bound and the associated gap (when an integer solution is found). The only instance where it was possible to obtain Pareto optimal solutions was with \(\vert J \vert =30\). In the other instances, two hours was insufficient for the solver to calculate any feasible integer solution. This illustrates the combinatorial nature of this problem and the limited application of exact methods. This aspect was fundamental for the proposition of the method developed in this study. As shown in the following results, the use of the exact approach was only possible because the MIP-heuristic determines good-quality solutions that are inserted into the root node of the main problem tree. With this in mind, in the following tables and figures, all results related to the exact method refer to the application of the branch-and-bound method with the introduction of the integer incumbent solution generated by the MIP-heuristic. In Table 2, the results are represented in columns 4 to 7. The results of the exact method without incumbent solutions are shown in columns 3 and 4.

Table 2 Comparative performance of exact (with and without incumbent solutions) and MIP-heuristic approaches

To analyze the degree of conflict between the objectives \(v_1\) and \(v_2\) involved and the interference between one and the other, three efficient solutions to this problem in each instance were determined using the approximate and exact approaches. Two of these solutions are the lexicographic ones (Steps 1–3, which optimize production volumes and cost individually) and a third solution establishing an equal compromise between these two goals is obtained. This solution is determined between Steps 4–6 by setting \(\lambda =0.5\). An efficient solution will be determined whose image in the criterion space is a non-dominated point as close as possible to the ideal point in terms of the Tchebycheff Metric. This solution, previously defined, is called a compromise solution. These three solutions can be analyzed in detail and compared. The procedure can determine other efficient solutions by assigning different values to \(\lambda \). However, this would extend this text, adding little to the analysis.

Table 2 also illustrates the computational time (in seconds) for obtaining the different (potentially) Pareto optimal solutions by the exact method (with the incumbent solution), considering the four years of harvest (column 5), the %Gap of the determined solutions (column 6). These two columns are associated only with Step 3 (for \(k=1,2\)) or Step 6 (for \(k=0.5\)) of the procedure illustrated in Fig. 1 associated with the branch-and-bound (Exact\(^L\) or Exact\(^C\)) method implemented by Gurobi. This step starts the branching from the incumbent solution, previously determined by the heuristic (Mipstart). Then, the exact method is applied to obtain the optimal solution of the problem.

The CPU time of the MIP-heuristic for determining each feasible and integer solution is shown in column 5 from Table 2, associated only with the Steps 1 and 2 (for \(k=1,2\)) or Steps 4 and 5 (for \(k=0.5\)) of flowchart presented in Fig. 1. Note that the lexicographic solutions \(\textbf{s}_1^*\) are more easily determined than \(\textbf{s}^*_2\) and \(\textbf{s}^*_{0.5}\). This fact is supported by the imposed maximum Gap (less than the required tolerance of 0.50%) achieved with a CPU time of 5878 s for the largest instance. In contrast, the compromise solution is the most computationally costly because it optimizes both objectives with equal weight.

Evidently, as the number of plots increases, the difficulty in obtaining Pareto optimal solutions for the solved problems also increases. Up to the instance with 150 plots, the solutions determined by the exact method are within a maximum error of 0.50% considering the optimal value, with a CPU time of fewer than 10 min. For the two larger instances, the error gradually increases. This can be seen in the least-cost and compromise solution, where the highest values for %Gap and CPU time were observed. This means that the solutions determined for these instances are just over 4% error from the optimal value. Given the complexity of the model and the size of the instances solved, this is a highly satisfactory value. For example, the instance with 500 plots has more than 2.8 million binary variables (cf. in the Supplementary Material). The excellent quality of these solutions is mainly due to the use of Mipstart in this approach. A moderate increase in the computational effort was observed in the MIP-heuristic, mainly in the determination of the \(\textbf{s}^A_2\) solution. Most of the computational time (about 90%) was spent on deciding the planting and harvesting for the first cycle (Steps 1 and 4). However, the computational time is much less than that of the exact method. This shows the potential of the developed MIP-heuristic, which fixes variables and decomposes the problem for each cycle (cut), solving smaller subproblems in the branch-and-bound tree, with almost 1/4 of the number of variables used in the original problem.

In order to study the effect of other MIP-heuristics on this problem, such as Relaxation Induced Neighborhood Search (RINS, see Danna et al. (2005)) and Local Branching (see Fischetti and Lodi (2003)), was determined \(\textbf{s}_{0.5}^*\) by solving Subproblem (24) and measured the lower and upper bound (LB and UB) for the instance with \(\vert J\vert =65\) plots. A maximum time limit of 12 h was given to Gurobi, without the insertion of any initial solution. The heuristics are implemented internally by the solver. The Heuristics parameter controls the fraction of runtime spent on these heuristics. For example, if Heuristics = 0.05 (the default value) it means that 5% of runtime is devoted to heuristicsFootnote 1. Fig. 2 illustrates progress of the bounds for three values for this parameter: 0.05, 0.30 and 0.60. In the solver default configuration, an integer solution is determined only after 4 h of processing. As expected, increasing the focus on heuristics has a positive effect because as this parameter increases, an integer solution is obtained in a shorter time. Optimality is not proven in any simulation and there is little progress in these bounds after a solution is found. It took about 2 h to obtain a feasible solution with a Gap equal to 8% when Heuristics = 0.6. After more than 10 h of processing, the Gap is reduced to 3.85% with an objective value of 0.42477. Calculations with data in Table 3 illustrate that the objective value of Subproblem (24) determined by the MIP-heuristic is 0.42473. This experiment suggests that the application of these heuristics incorporated into the solver, which do not consider the special characteristics of the problem, does not have the same effect when compared to the proposed MIP-heuristic. In a much shorter time (10 s), a feasible solution of better quality is possible.

Fig. 2
figure 2

Evolution of the bounds in the Tchebycheff Subproblem (24) for \(\lambda =0.5\) in the exact method when a limit time of 12 h is given

Table 3 presents the objective values for each of the three solutions determined in each instance, comparing the exact and heuristic approaches. The lexicographic solutions obtained with the exact and heuristic methods have the same objective values for all the instances tested. This means the branch-and-bound technique failed to improve the incumbent solutions initially provided within two hours. This result indicates that the proposed MIP-heuristic provides solutions of surprising quality in a short computational time for smaller instances and the maximum of 2 h (7200 s) for larger instances. More specifically, all the approximate solutions with maximum production volumes have an error of only 0.5% from the optimal value. For the solutions with the minimum cost, this error is just over 3%. The lexicographic solutions offer extreme alternatives to the bi-objective problem, where one objective reaches its best value while the other reaches its worst value considering only solutions from an efficient set, i.e., the more unbalanced alternatives in the objectives. The range of these objectives illustrates the total conflict between \(v_1\) and \(v_2\). Considering the largest instance, while the production volume is around 614 thousand tons, the cost is around R$ 151 million. In relative terms and considering all instances, the relative difference between the best- and worst-case scenarios in production volumes and cost represent 25% and 34%, respectively. The manager can choose between these to prioritize one objective (production volumes or cost) and know how much the other will lose.

Table 3 Comparative of objective values determined per instance and efficient solution using the exact and approximate approaches

The exact and heuristic algorithms produce solutions that differ from each other for the compromise solutions. Starting from the initial solution \(\textbf{s}^A_{0.5}\), the exact method always produces a solution \(\textbf{s}_{0.5}^*\) using Problem (24) whose value of the cost (output) is less (greater) than or equal to the cost (output) of the approximate solution. Except for the 30-plot instance, although it cannot be guaranteed that \(\textbf{s}_{0.5}^A\) is Pareto optimal, one does not dominate the other (in the Pareto sense), so they are incomparable in the multi-objective sense and can constitute alternative solutions to lexicographic solutions.

Figure 3 illustrates an important managerial insight corresponding to the three solutions determined: the value of the cost per volume produced (R$\(\cdot \)ton\(^{-1}\)). Considering the objective values presented in Table 3, the alternative solutions determine different costs vs. benefits when \(v_1\) and \(v_2\) vary. While in the three larger instances the solution \(\textbf{s}_2\) has a higher cost, in the first two instances the highest cost belongs to \(\textbf{s}_1\). Although the compromise solution does not determine the highest production volume or the lowest cost, it is the alternative that best combines/balances the objectives. It provides the lowest cost per unit produced in all cases. This metric is an important tool in helping to choose a strategic plan for this problem.

Fig. 3
figure 3

Cost per ton produced in each efficient solution determined

Figure 4 illustrate differences in the production volumes and cost values for these two efficient solutions. The reference for calculating the relative deviations from the objectives is the exact ideal point, \(\textbf{I}^*=\left( v_1^+, v_2^- \right) ^\top \). The approximate solutions have a deviation slightly less than the maximum production volumes of the exact solutions. An overall average gives only 10%, i.e. the compromise solutions have an average error of approximately 10% compared to the maximum production volumes. In contrast, the approximate solutions have slightly larger deviations from the exact ones. On average, the approximate compromise solutions have a 14% higher cost than the minimum cost. Considering the range between the objectives, these results show that the compromise solutions have objective values relatively close to the ideal values. Therefore, they constitute alternative, viable and balanced options to the lexicographic solutions, whose objective values are extremes. The weight factor of \(\lambda =0.5\) assigned to each objective deviation in the Tchebycheff Problem is adequate to produce solutions whose objective vector is relatively close to the ideal point.

Fig. 4
figure 4

Percentage of deviations regarding production volumes of \(\textbf{s}_{0.5}^*\) and \(\textbf{s}_{0.5}^A\) relative to the best level of each objective

In practice at the sugarcane mills, the managers want a balanced solution that compromises production volume and costs. With this balanced solution, the mill managers can predict and size the contracting of labor, the ideal moment to rent machines and equipment necessary for the planting and harvesting operations, which are distinct and sign contracts with outsourced companies. This planning is crucial to the search for alternative sources of these resources in the market and exploit their competition, indispensable in making savings on costs, meeting its production demand.

Figure 5 analyzes the proportion of the areas cultivated with each variety i for the instance of 500 plots, considering the three solutions determined by the exact method. Note that sucrose cane varieties 11, 16 and 6 are planted at 29%, 24% and 20% in the total area in the \(\textbf{s}_1^*\) solution, respectively, occupying almost 75% of the cultivated area. These varieties are the most productive (see the Supplementary Material) for sucrose and fiber. On the other hand, almost 75% of the cultivated area comprises varieties 4, 9 and 10, respectively, being the least costly and consequently the least productive. Choosing planting varieties from efficient solutions establishes significant differences in harvest planning from the lexicographic solutions. The compromise solution mixes high (\(i=16\)) and low (\(i=9\)) productive varieties, representing almost 45% of the planted area. The other varieties, such as 1, 14 and 18, have median production and cost values. For the energy-cane varieties, the same trend is repeated. Varieties 21, 22 and 23 are used more in \(\textbf{s}_1^*\), while varieties 24 and 25 are used more in \(\textbf{s}_2^*\) because of differences between cost and productivity and the priority of each solution. In the compromise solution, varieties 21, 23 (more productive) and 25 (less productive) are used the most. These results show that the conflict between objectives is mainly related to the choice of varieties to be planted since this choice interferes with the harvest yield in the following four years. When an objective (cost or production volumes) is optimized, the varieties that contribute to this are chosen when solving a problem that is \(25\%\) the size of the original problem (only one cut). This justifies the success of the decomposition and RF &FO used for solving this problem, providing solutions of excellent quality in a reasonable computational effort.

Fig. 5
figure 5

Percentages of planted area with each variety of the instance with 500 plots along the three Pareto optimal solutions

To finish the first subsection of the computational results, Fig. 6 illustrates 11 non-dominated points for the instance with \(\vert J \vert =150\) plots produced by the exact and heuristic algorithms. These points, associated with 11 solutions in the decision space, were determined by varying the \(\lambda \) weights uniformly in the interval [0, 1] in 0.1 steps. Highlighted in red, green and blue are the ideal, non-dominated points determined by the exact and heuristic approaches, respectively. The computational experiments up to this point have only determined three efficient solutions: two lexicographical and one with equal compromise to the objectives obtained using \(\lambda =0.5\). However, it should be emphasized that the scalarization method, coupled with the proposed heuristics can determine as many efficient solutions as desired by varying the \(\lambda \) weight of the deviations of the objective values related to the coordinates of the ideal vector \(\textbf{I}^*\). For example, a decision-maker more concerned with higher production volumes will adopt \(\lambda \rightarrow 1\) (for instance, \(\lambda \in [0.75,0.9]\)). On the other hand, a decision-maker focused on decreasing cost should use \(\lambda \rightarrow 0\) (i.e., \(\lambda \in [0.1,0.25]\)) and if they want a more balanced efficient solution, they should use \(\lambda \in [0.4,0.6]\). Note also that the non-dominated points have a good distribution along the Pareto frontier. The figure also shows the good quality of the efficient solutions, indicated by the points connected by the continuous line very close to the exact Pareto frontier for this problem (the dashed line). Moreover, while the exact method required 6545 s to determine these 11 points, the heuristic got them in 1420 s.

Fig. 6
figure 6

Generation of 11 points generated by exact and MIP-heuristic proposed approach using Tchebycheff problem with \(\lambda \in \{0,0.1,\ldots ,1\}\) for instance with \(\vert J \vert =150\) plots

To illustrate the result of the model after the optimization process, the complete calendar defined by the compromise solution \(\textbf{s}_{0.5}^*\) with \(\vert J \vert =30\) plots is given in the Supplementary Material. It shows which variety was grown in each plot, along with the scheduling of harvest months and their respective deviations from the maturation peak in each of the \(\vert C \vert =4\) cuts.

6.2 Computational tests with semi-random instances

To validate the proposed MIP-heuristic, a larger number of problems were solved, generating some parameters in a semi-random way, within a range that sought to reproduce the values practiced by the mills. This allows us to certify the quality of the MIP-heuristic, making it more reliable for solving this problem. The same five instances used previously for these tests were adopted, with \(\vert J \vert =30\), 65, 150, 300 and 500 plots. For each instance, 20 examples were generated whose parameters were obtained by using a uniform distribution (according to Tables presented in the Supplementary Material). In total, 100 problems were solved.

In these experiments, the two lexicographic solutions were determined for each example by the two approaches (\(\textbf{s}^*_1\), \(\textbf{s}^*_2\), \(\textbf{s}^A_1\) and \(\textbf{s}^A_2\)). First, the lexicographic approximate solutions were determined (\(\textbf{s}^A_1\) and \(\textbf{s}^A_2\), Steps 1 and 2, see Fig. 1). Next, the exact method was implemented in the Gurobi software, providing as incumbent the approximate solutions (Step 3, see Fig. 1). As stopping criteria, a maximum of 1 h was imposed for Gurobi to solve each problem or stop when the Gap was less than 1%.

Table 4 presents the numerical results obtained by comparing the heuristic solutions with those produced by the exact method. In each instance, the following values were calculated among the solutions of the 20 solved problems: average relative variation (in %) in production volumes and cost, \(\overline{\Delta v_1}\) and \(\overline{\Delta v_2}\), respectively. The relative variations are given in Eq. (40):

$$\begin{aligned} \Delta v_k = \left[ \frac{v_k(\textbf{s}_k^*) - v_k(\textbf{s}_k^A)}{v_k(\textbf{s}_k^*)} \right] \times 100, \quad k \in \{1,2\}. \end{aligned}$$
(40)

If \(\Delta v_1 > 0\), then the exact solution is better than the heuristic solution for objective \(v_1\). On the other hand, if \(\Delta v_2 <0\), the exact solution is better than the heuristic for objective \(v_2\). With the intention of evaluating the worst performance of the heuristic, the maximum relative variations (in %) in production and cost (\(\max \Delta v_1\) and \(\max \Delta v_2\), respectively) were calculated. Also, the proportion was determined in which the variations in production volume and cost were equal to zero (% of \(\Delta v_1 =0\) and % of \(\Delta v_2 =0\), respectively). Computational performance measures were determined such as average CPU time of the heuristic and the exact method (sec.) and the average %Gap of the solutions obtained by Gurobi.

Table 4 Comparative analysis of MIP-heuristic performance: mean relative error of the objective values, maximum relative error, percentage of times that the solutions of the exact and heuristic approaches are equal, computational effort and average of optimality gap of the exact method

As expected, the exact method improves the heuristic solutions in both objectives. However, this improvement is very modest when the computational effort of the solver is taken into account. For example, for \(\vert J \vert =500\), considering the lexicographic solution with maximum production, \(\textbf{s}_1^*\), it is only 0.02% better for objective \(v_1\) than \(\textbf{s}_1^A\) (on average). However, for objective \(v_2\), this is 0.02% worse (on average). This observation is valid for the other instances. Similar results are obtained when objective \(v_2\) on the minimum cost solutions for the heuristic and exact methods are compared,, with even less perceptible variations (less than 0.12% in the cost).

The maximum values of the errors indicate slightly better performance in the maximum production solutions whose maximum error is only 0.52% when \(\vert J \vert =30\) and decreases to 0.08% when \(\vert J \vert =500\). This decrease in error is due to the solver having greater difficulty in improving the incumbent solution due to the larger number of binary variables and constraints. These statements are verified when the rate at which the solver failed to improve the heuristic solutions is analyzed. In the largest instance and when maximizing the objective \(v_1\), in 75% of the examples, the solver failed to progress on both objectives in 1 h; this proportion is even higher in the least-cost solution (85% of the instances). This little progress is explained by the optimality gaps provided by the Gurobi, an average of 1.26%. This means that at a high rate of solved problems, the heuristic solutions are less than 1.26% away from the optimal solutions consuming around a tenth of the computational time.

These findings corroborate the results of Subection 6.1 and provide a certificate of quality and computational effectiveness for the developed MIP-heuristic. With a reasonable computational effort, the proposed algorithm can produce excellent quality solutions for a combinatorial problem, difficult to solve by the best commercial optimization solvers.

7 Conclusions

This study presents a bi-objective model for the problem of planting and harvesting sucrose cane and energy-cane, aiming to maximize the production of sucrose and fiber while minimizing the operational cost of the chain from a multi-objective perspective. With the selection of sucrose cane and energy-cane varieties, the model determines a planting and harvesting schedule, taking into account the operational constraints and the monthly demands at the mill.

Due to the combinatorial nature and the high number of binary variables, the proposed bi-objective model is challenging to solve using exact optimization methods. Even in small instances, tests have revealed that the solver can take up to 4 h to determine an integer solution. Simulations with other heuristics (implemented internally by the solver) were also unable to solve this problem in an acceptable time, not even with an imposed upper limit of 12 h. To overcome these drawbacks and solve the mathematical model in real-world applications, a MIP-heuristic was proposed using the RF &FO principles. This approach has been adapted for the bi-objective problem to obtain approximations to the efficient solutions.

Computational tests were divided into two parts to validate the proposed mathematical model and the solution approach. The first one explored the trade-off of five instances based on the parameters practiced by sugarcane mills in the Southeast region of Brazil. The analysis revealed that the problem has an intense conflict between the cost and volume production objectives, justifying the need for a multi-objective approach. On average, while the maximum and minimum possible production volumes vary by 25%, the associated costs between the worst and best scenarios vary by nearly 34%. In order to balance the disparities in the objective values of the lexicographic solutions, the study compares, for different instances, the compromise solutions whose weight of deviations from the ideal point is equal to both objectives. These results present an alternative that balances the objectives and the decision-maker can measure how much one objective is made worse as the other is improved and vice-versa, according to the interests of the mill. This alternative, in terms of cost per production volume, provides the better cost vs. benefit along the solutions determined. The proposed approach can generate as many efficient solutions as desired, including the compromise solution, by simply varying a single parameter according to the preference of the decision maker.

The second part of the computational tests evaluated the proposed approach using one hundred instances generated semi-randomly. The results revealed the good quality of the MIP-heuristic, both in computational performance and proximity to the solutions determined by the exact method. The instances with 500 plots have more than 3 million binary variables, which could be solved with the proposed procedure in less than 10 min per efficient solution (on average). Moreover, there was no need for the exact method to improve the solution obtained by the heuristic method in more than 75% of the problems solved, demonstrating the good performance of this method. When there was a small improvement obtained by the exact method, the difference was always close to or slightly greater than 1%, according to the results shown by the duality gaps. For such solutions, Gurobi has provided a maximum error gap of 1.43% to the optimal values (worst case). This indicates that all the problems solved can have their (potentially) Pareto optimal solutions efficiently determined by the proposed technique.

7.1 Future research directions

Directions for future research include developing metaheuristics (e.g. evolutionary algorithms) or alternative MIP-heuristics to obtain approximate solutions to this problem in instances with more plots (\(> 2000\)). Another option would be to consider stochasticity in some of the parameters of this model, including demand and productivity. This model can be extended to include equipment scheduling and harvesting fronts, such as the routing of harvesters between farms. The inclusion of others practical constraints such as capacity of harvesting machines and mechanized devices on the farm are also possible. Finally, it is interesting to consider a more comprehensive model, where a percentage of plots have already been cultivated and it is only necessary to schedule the harvest during the years.