Introduction

Worldwide agriculture and livestock sectors contribute significantly to the anthropogenic emissions of greenhouse gases (GHGs), predominantly methane (CH4) (Riaño and García-González 2015; Ngwabie et al. 2018). Globally, about 18% of anthropogenic emissions occur from livestock production whereas 37% of anthropogenic CH4 generates only from ruminant enteric fermentation and manure management (Steinfeld et al. 2006). Among the livestock animals, pig contributes about 13% of the total emissions of GHGs which is the second-highest source of GHGs emission in the livestock sector (FAO 2011). In recent years, the total pig production in Korea has rapidly increased due to the changes in dietary life (Lu et al. 2008; Wang et al. 2010) and hence, CH4 emissions are significant from pig farming. Ji and Park (2012) noted that the annual growth rate of CH4 emission from manure management in Korea was 2.6% from 1990 to 2009, which has a substantial contribution to GHGs on climate change. Therefore, CH4 emission modelling can act as a preliminary step for controlling the mechanisms of GHGs. It gives a clear view of the CH4 emission rate from pig manure based on the quantity of feed intake and age of pig, which might be helpful to reduce the CH4 emission rate in pig barns. Within this context, this study was conducted to characterize and quantify pig manure for methane emission modelling.

Methane emission from pig’s manure is mainly affected by environmental conditions, manure properties (e.g., moisture content and pH), manure management practices, and the portion of manure anaerobically decomposed (Liang et al. 2005; Cortus et al. 2012; Hetchler et al. 2015). Moreover, the manure properties primarily depend upon the nutrient value of the feed given to the pig and the age of the pig. It is noted that the nitrogen excretion rate decreased as a percent of the bodyweight of pigs when the concentration of protein in the diet was reduced (Ogejo 2019). Since different factors influence in generating methane, it is essential to select appropriate input variables, which are informative for methane emission modelling. Moreover, to maximize the accuracy and minimize the error of model estimation, it is important to develop a precise model with the best architecture. Recently, a number of emission models such as regression equations, manure-DeNitrification DeComposition (DNDC), empirical model of CH4 emissions have been used to estimate the CH4 emission rate (Olesen et al. 2006; Li et al. 2012; Petersen et al. 2016; Ngwabie et al. 2018). For example, the relationship of CH4 production rate as a function of pigs’ mass was described through an exponential equation with coefficients of determination (R2) > 88% (Ngwabie et al. 2018). Manure-DNDC modeled results showed that crop cultivation and lagoon coverage with feed quality changes could reduce GHGs emissions by 30% at the farm scale (Li et al. 2012). These CH4 emission models were developed based on various methodologies including emission factors, empirical equations, and process-oriented mechanisms. The Intergovernmental Panel on Climate Change (IPCC 2006) also prepared guidelines based on the amount of volatile solids in manure for estimating CH4 emission in country-specific (tier 2 approach). Several studies have been conducted on the tier 2 approach to calculate country-specific CH4 emissions rate (ANIR 2012; Environment-Canada 2012; Du Toit et al. 2013; Ngwabie et al. 2018).

Sometimes, it seems challenging to figure out the best models among the available mathematical and statistical techniques. Considering the previous research, the current study evaluates the performance of five models i.e., multiple linear regression (MLR), polynomial regression (PR), ridge regression (RR), random forest regression (RFR), and artificial neural network (ANN) on CH4 emission modeling. Regression-based models is a very common term for a group of diverse statistical methods that are widely applied in various contexts in different fields (Seuront 2010; Cranford et al. 2011; Hirst 2012; Carey et al. 2013; Beninger and Boldina 2014; Ngwabie et al. 2018; Basak et al. 2019; Ekine-Dzivenu et al. 2020; Font-i-Furnols et al. 2021). In contrast, ML models have been utilized to overcome non-linear data processing and accomplish better prediction accuracy efficiently (Joharestani et al. 2019; Shin et al. 2000). However, a limited study was found to investigate ANN and RFR algorithms for environmental pollution modelling (Rybarczyk and Zalakeviciute 2018; Shahriar et al. 2020).

Estimating CH4 emission from pig manure requires an elaborate process including the establishment of the experimental setup, maintaining environmental conditions, dietary composition, quantity of feed intake, etc. Additionally, the CH4 emission rate varies considerably from year to year and within the same experimental pig barn, making it even more challenging to measure. In that sense, regression and ML techniques might be useful in predicting CH4 emission. These techniques reduce sampling efforts and cost and increase precision where samples are difficult to handle (Basak et al. 2019). As such, the objectives of the research are to characterize and quantify the daily manure production rates for its moisture, dry matter (DM), ash, and volatile solid (VS) daily excretion rates and finally to model CH4 production rates as a function of the feed intake and mass of pigs using regression and ML methods.

Materials and methods

Animal resources, experimental design, and data collection

The research methods and procedures were approved by the ethics and animal experimentation committee of the Gyeongsang National University (certification# GNU-150508-R0029). Two independent experiments were performed from 1st September to 1st December in 2019 and 2020 with six 2-month-old Yorkshire breed pigs in three experimental pig barns in Smart Farm Systems Laboratory at Gyeongsang National University. Three pig barns had slatted floors, with sidewalls made of galvanized steel and plywood, and an expanded polystyrene roof. In every experimental period, similar sizes of the pigs (ages and weights) were studied with three concentrated diets (Table 1). Similar sizes of pigs were kept in all barns, so that, it could be expected that manure production rates were identical. A two-week observation phase was implemented before beginning the experiment to define the best data measuring conditions. In both of the experimental periods, the Health-Feed Gatdon3 (Daejoo Inc., Seoul, Republic of Korea) concentrated diet was provided to the pigs allotted to pig barn 1 (PB1), G Max Care (Growing pigs) (Nonghyup Feed Co., Ltd., Seoul, Republic of Korea feed) for pig barn 2 (PB2) and Growing Pigs Late Feed 10 (Nonghyup Feed Co., Ltd., Seoul, Republic of Korea) for pig barn 3 (PB3). An equal amount of feed was provided two times in a day, at 10.00 am and 17.00 and the amounts of feed intake was estimated from the daily recorded of feed offered and leftovers of each pig. The load cell was used to determine the pigs’ mass by averaging weights measured two times in a day. Moreover, drinkers and feeders were installed in the barns to restrain with halters for feeding and drinking (Fig. 1).

Table 1 Three concentrate diets and their ingredients
Fig. 1
figure 1

Layout of the experimental pig’s barn

Under each pen, polythene sheets were put to capture urine and faecal matter to calculate the rate of manure production of various types of diets. For covering the full surface area underneath the pen, slightly larger polythene sheets were used. It was raised to stop urine and faecal matter from escaping from the surface region of the manure. The polythenes were placed every morning before feeding and removed 24 h later (Ngwabie et al. 2018). The manure collected in each pig barn was weighed to measure all the pigs’ production rates over 24 h. To estimate the manure produced by a pig in a day (kg manure pig−1 day−1), the total amount of manure was divided by the number of pigs. The manure obtained from each pig barn, a small portion was subsequently used in the laboratory to analyze its moisture content, DM, ash, and VS content. Moreover, the pig’s body surface temperature (PBT) was measured using infrared sensors (IR sensor, model-MI3, Raytek Corporation, CA, USA) at 10.00 am and 17.00 two times a day. Temperature, humidity, and carbon dioxide data inside and outside of the pig barns were collected by using Livestock environment management systems (LEMS, AgriRoboTech Co., Ltd, Republic of Korea) and weather sensors (MetPRO, Producer: Campbell Scientific, USA), respectively.

Measurement of manure parameters

The pH level in each manure sample was measured using a portable pH meter (HP9010, Trans Instruments (S) Pte Ltd, Singapore) in a day. The DM and VS in manure were calculated according to the method 1648 of the U.S. Environmental Protection Agency (Telliard 2001). The overall procedure was conducted in the following steps. (i) A portion of manure in each sample was weighed using an electronic mass balance (model-FX-300iWP, A&D Company Limited, Tokyo, Japan) to obtain its wet mass (Mw). (ii) It was oven-dried at 105 °C for 12 h using 5E-DHG6340 drying oven (Shelves for 5E-DHG6310: 2 Layers, Changsha Kaiyuan Instruments Co., Ltd, China), after which it was weighed to obtain the dry mass (Md). (iii) Eqs. (1), (2), and (3) were applied to calculate the percentage of DM and the daily DM excreted by each pig as well as the moisture content of the manure. (iv) In order to calculate the VS content from each oven-dried sample, a part of the sample was weighed in a crucible of known mass Mc to obtain its mass (M1). The crucibles with samples were then placed in a muffle furnace (Digital Muffle Furnace 14 Lit “FX-14” 1,000 °C, Korea) and heated at 450 °C for 4 h. After cooling the sample in the furnace, the remaining part of the heated sample (ash) and crucibles were weighed again to obtain the mass M2. The VS and ash as a percentage of the DM were calculated according to Eqs. (4) and (5). (v) Finally, Eq. (6) was used to calculate the daily VS excreted per pig.

$$DM=\frac{{M}_{d}}{{M}_{w}}\times 100$$
(1)
$${DM}_{pig}=\frac{DM\mathrm{\%}}{100}\times {M}_{Pig}$$
(2)
$${M}_{c} \left(\mathrm{\%}\right)=\frac{{M}_{w}-{M}_{d}}{{M}_{w}}\times 100$$
(3)
$${VS}_{DM}\left(\mathrm{\%}\right)=\frac{{(M}_{1}-{M}_{c})-({M}_{2}-{M}_{c})}{{(M}_{1}-{M}_{c})}$$
(4)
$${Ash}_{DM}\left(\%\right)=100-{VS}_{DM}\left(\%\right)$$
(5)
$${VS}_{pig}=\frac{{VS}_{DM}(\mathrm{\%})}{100}\times {DM}_{pig}$$
(6)

Measurement of methane production rate

The CH4 production rate from pig manure was calculated using the IPCC tier 2 approach shown in Eq. (7) (IPCC 2006). Several studies have used the same approach to calculate the CH4 production rate (ANIR 2012; Du Toit et al. 2013; Shin et al. 2016; Ngwabie et al. 2018). According to the IPCC tier 2 approach, the model requires country-specific input values for VS excreted from manure, the maximum CH4 producing capacity (B0) for pig manure, and CH4 conversion factor (MCF). Here, MCF indicates that the percentage of VS in manure converts to CH4 compared to the theoretical maximum.

$$EF=VS\times {B}_{0}\times MCF\times 0.67$$
(7)

where EF is the CH4 production rate (kg CH4 pig−1 day−1). VS is the daily volatile solid excreted (kg VS pig−1 day−1) from manure in the present experiment. The B0 value primarily depends on the type of diet (Kumar et al. 2014). It is reported that feeding a high forage diet leads to more methane emissions compared to a concentrated diet (Won et al. 2014). In Korea, pig-breeding circumstances prefer a more concentrated diet than forage (Won et al. 2014). Thus, in the present work, B0 value of 0.0579 m3 CH4 Kg−1 of VS excreted (Won et al. 2014) and MCF value of 0.39 (Park et al. 2006) were used to calculate the CH4 production rate, while 0.67 is the conversion factor from m3 CH4 to kg CH4.

Data analysis and model development

Data obtained through two experimental periods from the three pig barns were used to develop statistical and machine learning methods for CH4 emission modelling. During the model preparing stage, the mass of pig (MP), age, and feed intake (FI) are considered input variables. However, high multicollinearity was examined between body weight and age values (correlation coefficient (r) = 0.93); thus, MP and FI were selected as inputs variables in the present study. The Z-score data normalization technique (Eq. 8) was used to keep values within a scale applied across all numeric columns used in the model.

$$Z=\frac{x-\mu }{\sigma }$$
(8)

where z is the standard score; x is the value in the data set; μ is the mean of all values in the data set; and σ is the standard deviation.

Recently, many statistical and machine learning (ML) methods are being utilized for both prediction and inference in different research fields. In this study, two statistical models, i.e., multiple linear regression (MLR) and polynomial regression (PR), and three ML algorithms, i.e., ridge regression (RR), random forest regression (RFR), and feed forward-back propagation (FFBP), in ANN were evaluated based on how well these algorithms predicted CH4 emission from the four datasets presented in Table 2. One of the main reasons for using statistical models is that they help for intuitive visualizations of data that aid in identifying relationships between variables and making predictions. ML models, on the other hand, focus on prediction, employing general-purpose learning algorithms to uncover patterns in often complex and unwieldy data. Moreover, the rationale for using these ML models was its specialty to predict non-linear interactions between the experimental and predictor variables. In the following sections of methodology, the statistical and ML approaches used for predicting CH4 emission are briefly discussed.

Table 2 Year wise compositions diets, pig barns, datasets and number of data

Multiple linear regression

Multiple linear regression (MLR) has the ability to model explanatory (MP and FI) and response variables (CH4 emission rate) more simply and comprehensively (Tabachnick and Fidell 2001). In the present study, the MLR model was developed according to the Eq. (9) (Darlington and Hayes 2016):

$${Y}_{\mathrm{i}}={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+.............+{ \beta }_{n}{X}_{n}+\varepsilon$$
(9)

where Yi is the CH4 emission rate, β0–βn are the coefficients of regression, X1–Xn are the input variables, and ε is the error associated with the ith observation.

Polynomial regression

The polynomial regression (PR) is also a form of regression in which a non-linear relationship between the explanatory and response variables is modeled as a degree polynomial. Therefore, PR is considered to be a particular case of the MLR model (Ostertagová, 2012). The general form of a complete second-degree PR model with two independent variables X1 and X2 as shown Eq. (10).

$${\mu }_{y\mathrm{\rm I}\left(\mathrm{X}1,\mathrm{ X}2\right)}={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+{\beta }_{3}{X}_{1}^{2}+{\beta }_{4}{X}_{2}^{2}+{\beta }_{5}{X}_{1}{X}_{2}+\varepsilon$$
(10)

where \(\mu_{y\mathrm I\left(\mathrm X1,\;\mathrm X2\right)}\) is the true mean response for the two independent variables (MP and FI), β (0, 1, 2, 3, 4, 5) is the model parameters, and X (1, 2) is the independent variable. After experimenting on different degrees of polynomials (order = 2, 3, 4, 5), the present work decided to use 3-degree polynomial regression due to its better performance compared to others.

Ridge regression

In the present research work, a penalty-based regression procedure (i.e., ridge regression (RR)) was also used to model CH4 emission. The RR has been widely utilized to measure many characteristics of a single sample simultaneously (Tibshirani 1996; McDonald 2009; Ransom et al. 2019; Wieringen 2020). It is a continuous form of shrinkage in which the residual sum of squares is minimized as each parameter's coefficient is adjusted near to zero, thus reducing the importance or influence of any particular parameter (Hoerl and Kennard 2000; Ransom et al. 2019). RR equation is also very close to least square, but the minimization equation is slightly different, as shown in Eq. (11). Specifically, it is as follows.

$${L}_{ridge}({}_{\beta }{}^{\wedge })={\sum }_{i=1}^{n}{\left({y}_{i}-{x}_{i}^{\mathrm{^{\prime}}}\right.{}_{\beta }{}^{\wedge })}^{2}+\lambda {\sum }_{j=1}^{m}{({}_{\beta }{}^{\wedge })}_{j}^{2}=\Vert y-{{X}_{B}^{\wedge }\Vert }^{2}+\lambda {{||}_{B}^{\wedge }\Vert }^{2}$$
(11)

where \(\Vert y-{{X}_{B}^{\wedge }\Vert }^{2}\) is called the sum of the squares of all coefficients (RSS) and it is also denoted as a loss function, and the λ parameter is the regularization penalty.

Random forest regression (RFR)

In recent times, random forest regression (RFR) is considered one of the most effective machine learning algorithms (Biau and Scornet 2016). It has been applied to a wide range of learning tasks, but most prominently to classification and regression (Biau et al. 2008; Biau 2012; Denil et al. 2014; Ransom et al. 2019; Shahriar et al. 2020). RFR model is an ensemble of trees, where the construction of each tree is made randomly. After building an ensemble of trees, the RFR model makes predictions by averaging the prediction of an individual tree. Even for very high-dimensional problems, RFR often makes accurate and robust predictions (Biau 2012). The random forest estimator associated with the tree collection is defined by the Eq. (12).

$${\stackrel{\sim }{\upeta }}_{n}{\mathbb{V}}_{T}\left(x\right):=\frac{1}{T} \sum\limits_{j=1}^{T}{\stackrel{\sim }{\upeta }}_{n},Aj\left(x\right)=\frac{1}{T} \sum\limits_{j=1}^{T}\frac{1}{N\left(Aj\left(x\right)\right)} \sum\limits_{i=1}^{n}{Y}_{i}{1}_{{X}_{i}\in Aj\left(x\right)}$$
(12)

A more comprehensive introduction to random forest algorithm was reported by Friedman et al. (2009) and Biau and Scornet (2016).

Artificial neural network

The artificial neural network (ANN) models consist of interconnecting artificial neurons by transferring signals to another, along with weighted connections (Feng et al. 2015). The input, hidden, and output layers are needed to create an ANN topology. Every input value is regarded as a neuron in the input layer. All input values are weighted randomly at first, and then, the weighted values are processed into the hidden layers. In the hidden layers, every neuron produces output values for the success of the ANN model. ANN models are widely used to analyse non-linear data (Shin et al. 2000). Different architectures of ANN, like feed forward-back propagation neural network (Basak et al. 2020), adaptive logic network (Qu et al. 2001), radial basis function network (Boilot et al. 2002), self-organizing map network (Sinesio et al. 2000), time-delay neural network (Zhang et al. 2003), and hybrid Bi-GRU-ARIMA model (PAHM) (Jaihuni et al. 2020) have been applied to analyze data in a number of studies. After experimenting on several multilayer perceptrons (MLP) structures with neurons and three transfer functions (Log-sigmoid, linear transfer function (purelin), and Tansigmoid), the study decided to employ FFBP neural network, gradient descent weight and bias learning function, one hidden layer and log-sigmoid transfer function. The output of an ANN network was noted by Hydrology (2000).

$${y}_{t}={\alpha }_{0}+\sum\limits_{j=1}^{n}{\alpha }_{j}f\left(\sum\limits_{i=1}^{m}{\beta }_{ij}{y}_{t-1}+{\beta }_{0j}\right)+{\varepsilon }_{t}$$
(13)

where yt is the network output (pig’s body temperature), n is the number of hidden nodes, m is the number of input nodes, f is the transfer function, βij {i = 1, 2,…, m; j = 0, 1,…, n} are the weights from the input to hidden nodes, αj {j = 0, 1, …, n} are the vectors of weights from the hidden to the output nodes, and α0 and β0j denote the weights of arcs leading from the bias terms.

Application methodology and performance metrics

All the statistical and ML models were developed using open-source libraries under the Python (Python 3.7.0) environment in the present work. Python is a high-level, interpreted programming language that can be used for different uses, including for scientific purposes (Tran et al. 2020). In the Python platform, different libraries such as NumPy (Van Der Walt et al. 2011), Pandas (McKinney 2010), and Matplotlib (Hunter 2007) were used for processing, manipulating, and visualizing data. In the current study, 70% of the data was selected as the training set, while the remaining 30% was used as the testing dataset. The performance of the models was evaluated on the basis of the two statistical quality parameters, i.e., root mean square error (RMSE) (Eq. 14) and coefficient of determination (R2) (Eq. 15).

$$\mathrm{RMSE}=\sqrt{\frac{{\sum_{\mathrm{i}=1}^{\mathrm{n}}({\mathrm{O}}_{i}-{\mathrm{P}}_{i})}^{2}}{\mathrm{n}}}$$
(14)
$$\mathrm R^2=\frac{\sum_{\mathrm i=1}^{\mathrm n}({\mathrm O}_i-\overline{\mathrm O})({\mathrm P}_i-\overline{\mathrm P})}{{\sqrt{\sum_{\mathrm i=1}^{\mathrm n}({\mathrm O}_i-\overline{\mathrm O}})^2}\times{\sqrt{\sum_{\mathrm i=1}^{\mathrm n}({\mathrm P}_i-\overline{\mathrm P}})^2}}$$
(15)

where n is the number of data, Oi is the observed values, Pi is the predicted values, and the bar denotes the mean of the variable. All statistical calculations in this study were performed with Statistical Package for the Social Sciences (IBM SPSS Statistics 22.0.0.0, NY, USA) and Origin Pro 9.5.5 (OriginLab, Northampton, MA, USA).

Results and discussions

Environmental data measurement

Manure temperature was measured from the collected samples ranging from 25.6 to 31.7 °C, 24.3 to 30.8 °C, and 26.1 to 32.4 °C at pig barns 1, 2, and 3, respectively. Variations of air temperature, humidity, and pig’s body temperature of the three pig’s barns for both experimental periods are shown in Fig. 2. In summary, the air temperature in the PB1 ranged from 9.4 to 31.6 °C, 9.1 to 33.3 °C for PB2, and 9.3 to 32.6 °C for PB3 during the two experimental periods in 2019 and 2020. The average pig’s body temperatures for PB1, PB2, and PB3 were 32.28 ± 2.66 °C, 32.29 ± 2.87 °C, and 32.18 ± 2.53 °C, respectively in 2019, and the corresponding average for 2020 were 32.53 ± 2.63 °C, 32.01 ± 2.63 °C, and 31.95 ± 2.54 °C, respectively. In order to have a good understanding of the relationship of ambient environmental parameters and the body temperature of pigs, the result was described in our earlier study (Basak et al. 2020).

Fig. 2
figure 2

Variations of air temperature, humidity of the three pig’s barn, and pig’s body temperature in 2019 and 2020 (data were collected at 10.00 am and 17.00 pm two times a day). ART represents the air room temperature, RH represents the room relative humidity and THI denotes the temperature-humidity index in pig’s barn

Estimation of methane production rate

The first step in this section in addressing this question was to find out if there was a significant statistical difference between the three concentrated diets and the manure production rate. An analysis of variance using the three types of concentrated diets and manure production resulted in an insignificant two-way interaction (p = 0.85). Manure production rates, moisture DM ratios, ash, and VS production rates according to the diets are presented in Table 3. It is noteworthy that manure production rates, DM, and VS production from manure increased with the growing of pigs’ mass and feed intake. Though the barns had different diets, therefore, comparable DM and VS data were measured. The study result showed that manure DM excretion rates were 1.95 ± 1.10, 1.94 ± 1.13 and 1.86 ± 1.12 kg pig−1 day−1 for PB1, PB2, and PB3, respectively and manure VS concentrations were 1.42 ± 0.80, 1.41 ± 0.84 and 1.35 ± 0.82 kg pig−1 day−1 for barns PB1, PB2, and PB3, respectively in 2019. It should, however, be noted that there really was no significant effect of the concentrated diets on varying the production rate of DM (p = 0.82) and VS (p = 0.88). There was not a statistically significant difference between the diets and DM (p = 0.82) and diets and VS concentration rates (p = 0.88). Combining the results of the three pig’s barns showed that with body mass ranging from 60 to 90 kg, a pig produced around 4.78 ± 1.21 kg of manure per day consisting of 66.51 ± 4.46% moisture content and 33.49 ± 4.46% DM. The manure’s ash content was 28.74 ± 4.44% DM (0.46 ± 0.16 kg pig−1 day−1), while the VS was 71.26 ± 4.44% DM (1.13 ± 0.32 kg pig−1 day−1). Likewise, manure characteristics have been reported in some studies (Hamilton et al. 1997; IPCC 2006; Won et al. 2014; Dennehy et al. 2017; Shin et al. 2017; Ngwabie et al. 2018). According to Ngwabie et al. (2018), a 50 kg pig produced approximately 3 kg of manure per day, of which 2.09 kg (~ 70%) was the moisture content and 0.91 kg (~ 30%) was the DM content. It has been widely reported that the diet ratio is crucial in DM and VS production. Another study on corn-based ration showed that pigs of 23–79 kg produced 2.7–3.6 kg of manure per day, of which the VS was 0.24–0.33 kg day−1 as excreted (Hamilton et al. 1997; Chastain et al. 1999). A similar report by Chastain et al. (1999) stated that a 60 kg pig produced about 5 kg of manure per day of which 0.51 kg is the VS.

Table 3 Feed intake, manure production, and its characterization from pigs of different masses and diets. The values in italics indicate the average results in a year among the diets and bold values indicate the average results in 2019 and 2020

The average CH4 production in all three barns is presented in Table 3, as seen the mean CH4 emission rates were in PB1: 0.021 ± 0.012 kg pig−1 day−1, PB2: 0.020 ± 0.013 kg pig−1 day−1, and the PB3: 0.019 ± 0.012 kg pig−1 day−1. Combining the results from the three barns, the average CH4 production rate was 0.020 ± 0.012 kg pig−1 day−1. Figure 3 shows the relationships between the mass of pigs, feed intake, and manure production from pigs with three concentrated diets (F1, F2, and F3) in 2019 and 2020. The modelled CH4 production ranged from 7.20 to 7.70 kg pig−1 year−1 which was lower than the IPCC 2006 value specified for Oceania (11–13 kg pig−1 year−1), Western Europe (6–21 kg pig−1 year−1), and even North America (10–23 kg pig−1 year−1) regions for market swine. One of the main reasons beyond this may be due to giving a more concentrated than forage diet in Korea. Several studies were conducted to find out the relation between diets and CH4 emission rate (Hamilton et al. 1997; Kumar et al. 2014; Won et al. 2014). It is reported that feeding high forage diet leads to more CH4 emission compared to a concentrated rich diet (Won et al. 2014). This is congruent with Beauchemin et al.’s (2008) findings for both diets, which also show that concentrated-based diets such as starch-rich grains produce less CH4 than forage-based diets. It is found that concentrated diets reduce enteric CH4 production by inhibiting the capacity of ruminal methanogens to take up hydrogen by reducing ruminal fluid pH and favoring the production of propionate over acetate (Van Kessel and Russell 1996; Pirondini et al. 2015). Pirondini et al. (2015) revealed that propionate production in the rumen decreases CH4 production because propiogenesis uses metabolic hydrogen that would otherwise be available to produce CH4. For instance, CH4 production decreased up to 31% when 50% of dietary forage fed to dairy cows was replaced with a concentrated wheat grain diet (Moate et al. 2014). The daily minimum ruminal pH, which was linked to CH4 generation, was responsible for the difference in CH4 production (Moate et al. 2017). Moreover, it should be noted that the manure management system, environmental conditions, and storage time also influence the CH4 emission (Wood et al. 2014).

Fig. 3
figure 3

Mass of pig, feed intake and manure production from pigs with three concentrated diets (F1, F2, and F3). 2019 a Mass of pig (kg pig−1) vs. CH4 emission rate (kg pig−1 day−1) in 2019; 2019 b Feed intake (gm pig−1 day−1) vs. CH4 emission rate (kg pig−1 day−1) in 2019; 2019. c Manure production (kg pig−1 day−1) vs. CH4 emission rate (kg pig−1 day−1) in 2019; 2020 (a): Mass of pig (kg pig−1) vs. CH4 emission rate (kg pig−1 day−1) in 2020; 2020 b Feed intake (gm pig−1 day−1) vs. CH4 emission rate (kg pig−1 day−1) in 2020; 2020 c Manure production (kg pig−1 day−1) vs. CH4 emission rate (kg pig−1 day−1) in 2020

Evaluating statistical algorithms for CH4 emission modelling

The performance of two statistical models (multiple linear regression and polynomial regression) (F1, F2, F3, and FC) is shown in Table 4. Comparatively, the PR model performed better than the MLR in the training stage. However, the performance of the two models was almost similar in the testing stage. Table 4 indicated that PR-based statistical models yielded R2 and RMSE values as 0.914 and 0.0035, which were the highest and lowest values, respectively in the training set as compared to the values obtained from MLR. Moreover, apart from the F1, F2, and F3 datasets, PR regression showed better results for the FC dataset (Table 4). The two models’ overall performance in terms of R2 is reasonably good, indicating that they explained more than 89% of the variations in the measured and predicted data. In general, the efficiency of all regression-based statistical models depends on the existence of linear relationships between explanatory and response variables. Due to their simplicity in nature, these models have been used widely to predict CH4 emission (Petersen et al. 2016; Ngwabie et al. 2018; Hempel et al. 2020). An exponential equation was used for CH4 emission modeling as a function of pigs' mass, where the models explained about 88% of the variations in the measured and predicted data (Ngwabie et al. 2018). Besides, the differences between the preceding and the current study were the selection of input variables and the algorithms used for CH4 modelling. As shown in the scatter plots (Fig. 4), the predicted CH4 emission rates have a very close distribution pattern with measured values with R2 = 0.892, 0.894, 0.866, and 0.885 for F1, F2, F3, and FC, respectively during the testing set for MLR and the corresponding values of R2 for PR model are 0.889, 0.879, 0.877, and 0.894.

Table 4 Performance metrics (R2 and RMSE) of the models during testing and training period. The values in italics indicate the best results among the models
Fig. 4
figure 4

Scatter plots of measured versus predicted CH4 emission rate using F1, F2, F3, and FC datasets and MLR, PR, RR, RFR, and ANN models

Evaluating machine learning algorithms for CH4 emission modelling

Determining the ML algorithm that best incorporates the mass of pigs and the quantity of feed intake into the CH4 emission rate depends on the investigatory priority. The performance metrics of RR, RFR, and ANN are given in Table 4. The result showed that the RR model performed slightly better than RFR and ANN in the testing stage, whereas they were almost the same for the training stage. Table 4 indicated that the RR model yielded R2 and RMSE values as 0.908 and 0.0035, which were the highest and lowest values, respectively in the testing set as compared to the values obtained from RFR and ANN. When the results of the ANN are compared to other machine learning models, ANN performance was slightly lower. Moreover, the ANN model showed some better results using a large dataset; however, RFR was overestimated in the training stage and showed lower performance in the testing stage.

In contrast, the RR showed good performance (R2 ≥ 0.90 and RMSE ≤ 0.0038) even within the reduced dataset (Table 4). Several studies reported that in general, ANN and RFR models are more efficient and doing better performance compared to the RR method when there is a highly non-linear and complex relationship exist between output and inputs variables (Singh et al. 2003; Abdel-Rahman et al. 2013; Gholipoor et al. 2013; Khairunniza-Bejo et al. 2014; Mansourian et al. 2017; Qin et al. 2018; Basak et al. 2020). To better understand the distribution of data and the ML models’ ability to predict the CH4 emission rate, the predicted and measured data for testing datasets were presented and compared in the scatter plots (Fig. 4). As shown in these plots (Fig. 4), the predicted CH4 emission rate has a very close distribution pattern with measured data, and both have almost the same pattern with R2 values of 0.922, 0.904, 0.907, and 0.901 for F1, F2, F3, and FC dataset, respectively for the RR model. From this standpoint, RR would be preferred due to its simplicity and interpretability of the parameters and performed better, even using a small dataset compared to other ML algorithms in this study.

Model comparison and proposed model

All regression-based models performed better compared to the artificial neural network models in this study. The selected RR model could predict CH4 emission rate for training and testing stages with a 2.50 and 6.20% increase in R2 and a reduction of 11.25, and 17.98% in RMSE, respectively, compared with the ANN model. The superiority of the RR model to the other statistical and ML models is also well defined by considering standard deviation and correlation metric (Fig. 5). A graphical presentation of the actual and predicted values by the statistical and ML models over a graph (Fig. 5) could better understand these models' abilities. As shown in Fig. 4, there is somewhat a linear relationship between the input and output variables, which may be one of the main reasons for performance variation among those models.

Fig. 5
figure 5

Taylor diagram of training and testing results of statistical and machine learning models

Moreover, some other studies also showed that ANN models could not significantly improve the prediction accuracy compared to the statistical models due to the linear nature of variables (Özesmi et al. 2006; Craninx et al. 2008). The difference in performance among those models to predict the CH4 emission rate showed the importance of choosing a proper model. According to the performance of those models, it can be concluded that the mass of pigs and the quantity of feed intake somewhat had a linear relationship with manure production and VS, which are closely associated with the CH4 emission rate. Therefore, in terms of developing the statistical and machine learning models in predicting CH4 emission from livestock manure, the study recommends the use of regression-based algorithms to reveal more fruitful results.

Conclusion

Measurements were carried out in three experimental pig’s barns with three different types of concentrated diets to characterize manure production. The quantity of manure produced per pig, moisture content, DM, ash, and VS contents increased with the mass and feed intake of pigs. Body mass ranged from 60 to 90 kg a pig produced around 3.35 kg of manure per day consisting of 66% moisture content and 34% DM. The manure's ash content was 28% DM (0.47 kg pig−1 day−1), while the VS was 72% DM (1.15 kg pig−1 day−1). In the present study, the pigs’ mass and the quantity of feed intake were used as explanatory variables to model the CH4 production rate. Five statistical and ML algorithms were evaluated based on three statistical qualitative parameters for CH4 emission modelling. The results showed that the regression-based models performed better than the ANN model. Moreover, the RR model was selected as the best model among those models in predicting CH4 production. This priority for RR models may be because of the existing linear association between the mass of pigs and the quantity of feed intake with the CH4 production rate. The RR model can explain more than 90% of the variations in all measured and predicted data in both the training and testing stages. With the ease of computing, simplicity, and interpretability of the parameters and better performance of RR models, it might be effective for CH4 emission modelling. However, the above-described input parameters may not always be the same when associated with CH4 emission from pig manure. Additionally, trying to achieve high prediction efficiency of CH4 emission modelling using the same attributes may lead to changes in the performance of the models. Therefore, research might be conducted to improve this model’s prediction accuracy providing a wider range of diets and management conditions.