1 Introduction

Greenhouses allow farmers to actively control growth environmental conditions such as temperature, light, relative humidity, and CO2 concentration. By controlling these environments, crops can be produced year-round regardless of the climate. Based on the benefits of environmental control, greenhouse use and size in agriculture is continuously increasing (Guo et al. 2012).

To maximize the benefits of greenhouse cultivation, it is necessary to control growth environments efficiently. Research that predicts environmental factors has been steadily conducted in order to better control greenhouse environments (Ehret et al. 2001; Sonneveld et al. 2005; Min et al. 2012; Cha et al. 2016; Yu et al. 2016). However, since greenhouses are not completely isolated from the outside, environmental changes within the greenhouse are affected by external factors. Therefore, it is not easy to predict and control the environmental changes within greenhouses. The concentration of CO2 is an important environmental factor in greenhouses and has a major influence on crop growth (McMurtrie and Wang 1993). Previous studies have reported the importance of CO2 concentration during crop growth and have attempted to estimate CO2 concentration (Critten 1991). Furthermore, there have been attempts to estimate and control CO2 concentration via artificial neural networks (ANNs) using overall greenhouse environments for a short period (Linker et al. 1998).

Recently, eco-friendly greenhouses have been studied with regard to environmental conservation (Cuce et al. 2016). In order to reduce CO2 emissions that are accelerating global warming, greenhouses that utilize surplus resources from power plants have been being studied. It is necessary to estimate and accurately control CO2 concentration within these greenhouses to effectively reduce CO2 emissions. However, CO2 concentration is affected not only by various environmental factors, but also by photosynthesis and respiration of the plants grown within the greenhouse itself. Therefore, although greenhouse environments can be controlled in many aspects, CO2 concentration has a complex nonlinear relationship with environmental variables.

ANNs have been used in recent studies to derive meaningful results from complex nonlinear data (Vaidyanathan 2015; Wang et al. 2016). ANN is gaining popularity over other algorithms because it can achieve high-level abstraction from raw data (LeCun et al. 2015). From the 1980s to the early 2000s, ANNs made simple estimations using small ANN structures. Since 2009, ANNs have been applied to various fields with the emergence of big data and hardware that has superior computational power compared to first-generation technology. The purpose of this study was to estimate CO2 concentration in greenhouses with an ANN that incorporates environmental factor data.

2 Materials and methods

2.1 Greenhouse and cultivation conditions

A double-span arch-type plastic house (34.4 W × 30.0 L × 5.7 H, m), 1,032 m2] located at Boryeong, Korea (36°23′34″N 126°29′12″E) was used for the experiment (Fig. 1). Polyolefin films with a thickness of 0.15 mm and a light transmittance of approximately 92% were used as a greenhouse covering material. The inside temperature was maintained at 25 ± 1 °C using a hot-water heating system. The ventilation system was automatically opened at a set point of 27 °C. One hundred 3-year-old Irwin mangoes (Mangifera indica L. Irwin) were planted in pots 0.8 m in diameter with a planting density of 6.25 m2 in the greenhouse. Organic content of the soil ranged from 38 to 120 g kg−1. Stem training and pruning were conducted periodically to induce vegetative growth of crops and to fix tree structure. A drip irrigation system was used for watering.

Fig. 1
figure 1

A double-span arch-type plastic house (34.4 W × 30.0 L × 5.7 H, m) used for Irwin mango cultivation located at Boryeong, Korea

2.2 Data collection and preprocessing

Environmental factors such as temperature, relative humidity, light intensity, atmospheric pressure, and CO2 concentration were measured using a complex sensor module developed by Korea Electronics Technology Institute (Seongnam, Korea). The sensor modules were placed at nine locations throughout the greenhouse. Environmental data from the greenhouse were measured every 10 min from July 27 to December 9 2016 and the mean value from the nine locations was used. Weather data such as temperature, relative humidity, wind direction, wind velocity, and atmospheric pressure measured at Boryeong Meteorological Station were used. The time of measuring environmental factors was also used for training. The outside CO2 concentration was constant at approximately 410 μmol mol−1. Table 1 shows the ranges of the environmental factors measured. In order to improve the training efficiency of the ANN, environmental data was normalized from 0 to 1. A total of 14,866 data points was used for estimating the CO2 concentration via the ANN.

Table 1 Ranges of the measured environmental factors

2.3 Artificial neural network (ANN)

The ANN consisted of an input layer, hidden layers, and an output layer, each of which had neurons. In this study, 10 neurons of the input layer corresponded to environmental data, two hidden layers, and one neuron of the output layer for CO2 concentration were selected (Fig. 2). In the hidden layers, the value received from the input layer was multiplied by weight and the input information was transmitted through the activation function. A Rectified Linear Unit (ReLU) function was used as the activation function f(x), where x corresponds to the input value given to each neuron (Eq. 1, Fig. 3).

$$f\left( x \right) = max\left( {0,x} \right)$$
(1)

Training and testing of the ANN were conducted after classifying 14,866 data randomly obtained from the collection process into 10,000 training data and 4866 test data. The training data helped to adjust and generalize the ANN according to the differences between estimated and measured values. The test data were used to confirm the accuracy of the trained ANN. In addition, data sequences were randomly mixed so that they would not be skewed to a certain period of time. The experiments were performed using Tensorflow (v. 0.11, Python Deep Learning Library, Google, Menlo Park, CA, USA). During the adjustment of the ANN, root mean square error (RMSE) was used for its optimization (Eq. 2).

$$RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Y_{i} - P_{i} } \right)^{2} }}{n}}$$
(2)

where Y i and P i are the CO2 concentrations measured by the sensor and estimated by the ANN, respectively. n refers to the total number of training data and i represents each training data point. The coefficient of determination (R2) was used for training accuracy and test accuracy to verify model robustness.

Fig. 2
figure 2

Structure of an artificial neural network including the input layer (I), hidden layer I (H1), hidden layer II (H2), and output layers (O). Inside and outside environmental factors that were used as inputs for I are listed

Fig. 3
figure 3

A rectified linear unit (described in Eq. 1) was used as the activation function in each node of the artificial neural network

For comparison with ANN, multivariate linear regression was conducted with the same data set using Eq. 3. The model was analyzed using the statistical programming language R (The University of Auckland, Auckland, New Zealand).

$$y = \varSigma a_{i} x_{i} + b$$
(3)

where y, x, a, and b are the dependent variable (CO2 concentration), independent variable (environmental factor), regression coefficient, and intercept, respectively. The subscript of i indicates the number of independent variables.

In order to train the ANN, the AdamOptimizer was used, which is a method widely used for optimization (Kingma and Ba 2014). The parameters for the AdamOptimizer were set to the commonly used values (Table 2). The number of training was 5000 times in total, where the case of using entire data is called 1 time. In order to confirm the optimal ANN structure, the number of neurons in the hidden layer was changed to 32, 64, 128, 256, 512, 1024, and 2048. The two hidden layers consisted of the same number of neurons.

Table 2 Parameters for the AdamOptimizer artificial neural network

3 Results and discussion

3.1 Accuracy of the artificial neural network

The maximum test accuracy (R2) and RMSE were 0.97 and 19.90, respectively, using 256 neurons in the hidden layer of the ANN structures (Table 3). When an ANN has an excessive number of neurons compared to the training data, the ANN cannot generalize the data and is adjusted to fit only the trained data, which is called overfitting (Tetko et al. 1995). When the number of neurons exceeded 256, the training accuracy increased while the test accuracy decreased because of overfitting. Therefore, increasing the ANN structure will not increase the estimation accuracy infinitely.

Table 3 Training and test accuracies of the artificial neural network according to the number of neurons in the hidden layers

In addition, the R2 and RMSE of the multivariate regression model were 0.78 and 54.70, respectively (Eq. 4), indicating that the ANN estimated the CO2 more accurately than the multivariate linear regression (Fig. 4).

$$\begin{aligned} C_{i} = & - 55.87*t + 11.28*T_{i} + 1.760*RH_{i} - 241.7*P_{i} - 0.05812*L_{i} - 9.114*T_{o} + 1.010*RH_{o} \\ & \quad + 0.04140*D_{o} - 0.7791*v_{o} + 243.8*P_{o} - 1237 \\ \end{aligned}$$
(4)

where C, t, T, RH, P, L, D, and v are the CO2 concentration, time, temperature, relative humidity, atmospheric pressure, light intensity, wind direction, and wind velocity, respectively. Subscripts of i and o mean inside and outside of the greenhouse.

Fig. 4
figure 4

Comparison of estimated and measured CO2 concentrations in the greenhouse when using 256 neurons in the hidden layers of the artificial neural network and multivariate linear regression (Table 3)

In the ANN, the accuracy was lower at about 500–600 μmol mol−1 CO2 concentrations. The measured CO2 concentration ranged from 337.0 to 794.5 μmol mol−1, but CO2 concentration data were lacking at approximately 500–600 μmol mol−1. Therefore, the ANN might not accurately estimate CO2 at these concentrations due to insufficient data at lower and higher concentrations.

3.2 Validation of CO2 concentration in the greenhouse

In general, the CO2 concentrations estimated by the ANN showed better agreement with those measured in the greenhouse than those estimated by the multivariate linear regression (Fig. 5). Compared to the ANN, the multivariate regression model inaccurately estimated the CO2 concentrations with about 100 μmol mol−1 difference on days 15 and 16.

Fig. 5
figure 5

Comparison of CO2 concentrations estimated using the artificial neural network (ANN), multivariate linear regression, and measured values in a single greenhouse between October 10–16, 2016

The ANN accurately estimated CO2 concentrations in the greenhouse using big data for the changes in inside temperature, relative humidity, and CO2 concentration without vent position data (Fig. 6). It was estimated that the inside CO2 concentration could be calculated based on ventilation and outside CO2 concentration after the ANN recognized the change in vent position from the sudden change in relative humidity. Due to the nature of black box modeling, it is difficult to determine exactly what environmental factors influenced the results. However, the ANN accurately estimated changes in CO2 concentration even though the ventilation affected various environmental factors.

Fig. 6
figure 6

Estimated and measured CO2 concentrations (a); inside temperature, relative humidity and PPFD (photosynthetic photon flux density) (b); and vent position (c) over 24 h starting at 06:30 on October 13, 2016. Vent positions represent the opening ratio of the windows (0%—closed and 100%—fully open)

3.3 Limitations and possibilities

The estimates performed in this study were limited to data obtained from a single greenhouse. ANNs should be trained with data from various measurement sites to generalize all possible situations (Lopez et al. 2001). Previous studies with high accuracy had more data points or more inputs related to the factor being estimated (Trejo-Perea et al. 2009). If conditions are difficult to measure, virtual conditions could be modeled with simulation (Beltramo et al. 2016). In this study, CO2 concentration could be estimated with a high coefficient of determination of 0.97 for the greenhouse located at Boryeong. To ensure that the ANN model used in this study is applicable to all greenhouses, it is necessary to verify the test accuracy using data from other greenhouses or simulation data. Despite experimental limitations, the ANN made significant estimations of the change in CO2 concentration in the greenhouse. Therefore, the CO2 concentration within the greenhouse could be estimated using an ANN that incorporated nine environmental factors. This suggests that CO2 concentration in greenhouses can be estimated even in cases of CO2 fertilization (Fernandez and Bailey 1992). Further studies are needed to estimate CO2 consumption by plants in greenhouses using ANN systems.