Keywords

1 Introduction

Edible mushrooms are highly valued for their nutritional benefits, which encompass reducing the risk of cancer, enhancing antioxidant levels, bolstering immunity, improving neurocognition, and augmenting Vitamin D intake [1]. Mushrooms hold a significant position as a vital food source in Australia, contributing substantially to the economy. As of June 2022, the country had produced an impressive 66, 236 tons of mushrooms, with a substantial production value of $434.2 million [5]. Small mushroom companies aspire to monitor the growth of mushrooms to improve growth conditions, enable timely and efficient harvesting, reduce waste, and optimize labor allocation. However, the expense associated with labeling is not affordable. Driven by industrial demand and a scarcity of relevant research in Australia, this paper undertakes an exploration of computer vision technologies for monitoring the growth of mushrooms. Due to oyster mushroom’s low environmental control prerequisites and limited vulnerability to fruiting body-affecting pests and diseases [6], it can be cultivated in a straightforward and cost-effective manner. Accordingly, this study concentrates on monitoring the growth of oyster mushrooms by using image classification algorithms.

In the realm of mushroom classification, prevailing studies predominantly focus on binary classification to differentiate between poisonous and edible mushrooms [8], or encompass multi-class classification for mushroom species categorization [12]. Notably, within the oyster mushroom domain, research endeavours have concentrated on aspects like freshness assessment [11, 17] and automated harvesting [13,14,15]. Exploring the growth stages of oyster mushrooms remains a scarcely explored avenue. Even when venturing into the broader realm of plant growth, analogous endeavours are infrequent. Hence, the primary objective of this paper is to delve into potential solutions for effectively monitoring the growth of oyster mushrooms under the real-world setting.

Fig. 1.
figure 1

(a) Mushroom cultivation in a shipping container at a mushroom company; (b) Three growth stages of oyster mushroom: Stage one is the early stage (top); Stage two is the intermediate stage (middle); Stage three is the mature stage (bottom).

Our work is a preliminary study to explore the automatic solution for a small company which grows exotic mushrooms in containers. There are two main challenges for this work: (1) The challenge lies in obtaining data. Due to the lack of appropriate existing data for monitoring of mushroom growth, the acquisition of data from a local small-scale mushroom company necessitates careful considerations: the optimal choice of an image collection device, the intricacies of image labeling, and the allocation of resources within a defined budget. (2) Navigating the challenge of image feature identification of mushrooms in different stages proves distinctive. Prior studies have predominantly concentrated on individual mushrooms positioned at the centre of the images, whereas this research extends the scope to capture the holistic representation of the growth stages in a complex setting. Our panoramic images (shown in Fig. 1(a)) encapsulate diverse mushroom stages dispersed across varying positions within each image.

To address the challenges, we design a oyster mushroom monitoring system, consisting of image acquisition, cloud storage, label map, and applications. We collect a small dataset of encompassing various growth stages of oyster mushrooms and collaborate with the staff at the mushroom company to label the dataset. The dataset contains oyster mushroom images in three growth stages as shown in Fig. 1(b). To accomplish the objective of monitoring oyster mushroom growth, we introduce label map to show the stage information from the patches of a panoramic image. The essence of the label map lies in equitably partitioning a panoramic image into N patches, subjecting these patches to an optimal feature extraction methodology and classifier, resulting in distinct labels. Later, these labels are subsequently organized into a vector, which is reshaped into a label map. Finally, our experimental findings highlight VGG-16 as the optimal architecture for feature extractor and classifier within our label map method by comparing the machine learning and deep learning image classification algorithms, achieving an accuracy rate of 82.22%.

Our main contributions are:

  • We design a solution for classifying multiple oyster mushroom growth stages within a panoramic image in a real-world complex setting.

  • We perform preliminary studies on recognizing oyster mushroom growth stages by exploring both traditional machine learning and deep learning models.

  • We address the data gap by curating and meticulously labeling a dataset encompassing various growth stages of oyster mushrooms.

Next, the subsequent sections encompass an exploration of related work (Sect. 2), a comprehensive elucidation of the research design (Sect. 3), and an insightful interpretation of empirical studies (Sect. 4).

2 Related Works

Data. In recent years, research endeavors in the oyster mushroom domain have spanned a diverse array of subjects, encompassing valorization and waste management [20], automated harvesting [13,14,15], freshness evaluation [11, 17], grading assessment [21], growth enhancement [7], as well as IoT-based monitoring systems [19]. This research focuses on monitoring the growth stage of oyster mushrooms using image classification algorithms. However, the work for the oyster mushroom stage image classification is rare. The most relevant work is Surige et al. [19] in which, the authors proposed to classify five different stages of the oyster mushroom life cycle, consisting of stage one (ten hours to harvest), stage two (five hours to harvest), stage three (harvest now), stage four (one day past - suitable for consumption) and stage five (2 days past - not suitable for consumption). Our work condensed five stages proposed in Surige et al. [19] into three distinct stages with revised descriptions, specifically highlighting the key phases that contribute to its successful cultivation, shown in Fig. 1(b). In the growth stages of oyster mushrooms, stage one is characterised by the readiness of mushroom grow kits for pinning or their presence in the pinning stage, where small pin-like structures emerge on the substrate as an early sign of mushroom development. Stage two is characterised by the emergence of small pin-like structures that reach a cap scanning of 3–4 cm, while stage three represents the maturation phase of the mushroom with a cap size ranging between 5–7 cm. Moreover, this paper aims to categorize the panoramic view of the entire oyster mushroom growing environment into three stages, which presents a more complex and challenging task compared to the classification of individual mushroom images.

Algorithm. In the past decade, computer vision has predominantly embraced deep learning algorithms, especially convolutional neural networks (CNNs). Regardless of architectural variations, CNNs fundamentally include convolutional layers (with or without ReLU activation and pooling) for feature extraction and fully connected layers for classification. The famous architectures consist of Visual Geometry Group (VGG) [18], MobileNet [16] and residual network (ResNet) [4]. Some researchers designed system to monitor or measure the growth of mushroom and used CNNs to recognize and localize mushroom. Lu et al. proposed a mushroom growth measurement system for common mushrooms in greenhouse encompassing image capture, mushroom recognition (using CNNs), position correction, size measurement, growth rate estimation, quantity assessment, harvest time calculation, data recording, and harvest notifications throughout the mushroom fruiting phase [10]. Surige et al. developed an IoT-based monitoring system for oyster mushroom featuring four functions: Environmental Monitoring utilizing long short term memory (LSTM), Harvest Time Detection using CNNs with the MobileNet V2 model, Disease Detection and Control Recommendation based on CNNs with MobileNet V2, and Yield Prediction employing LSTM [19]. This paper also proposes a oyster mushroom monitoring system with different components, including image acquisition, cloud storage, label map, and applications (shown in Fig. 2(a)). Zarifie et al. used pretrained VGG-16 to extract features and classify different grades based on quality of grey oyster mushroom [21]. However, in oyster mushroom domain, algorithms not only use deep learning methodologies, but also use machine learning methodologies (shown in Table 1).

Table 1. Oyster Mushroom Works

Some researchers realized colour, texture and morphology are important mushroom features, and extracted the important features manually by colour maps, then used ANN or the combination of ANN and SVM to classify the freshness of oyster mushrooms [11, 17]. Additionally, Vision Transformer (ViT) [3], a deep learning based algorithm which utilises self-attention mechanisms to extract feature inspired by Transformer models used in natural language processing, have gained popularity in the last three years. There is no paper explore ViT for oyster mushroom image classification. This paper delves into two distinct types of image classification algorithms. On one hand, given the small dataset and the significance of morphology as a feature, machine learning-based image classification algorithms exhibit promising potential. On the other hand, recognizing the subtle differences within and between classes, pretrained deep learning-based image classification algorithms may demonstrate exceptional discriminatory power in distinguishing various stages of oyster mushrooms.

Fig. 2.
figure 2

(a) Oyster mushroom monitoring system; (b) Label map procedure: (1) Auto-Cutter: automatically divide a panoramic image into \(P \times P\) patches; (2) Arrange patches in order from left to right, top to bottom and labelled by mushroom company staff; (3) Feature Extractor: extract features from the patches respectively; (4) Classifier: classify features into one of three distinct stages and generate corresponding labels, where the labels are represented as 0, 1, or 2; (5) Create a concatenated 1D vector of labels and increment the label values by 1, aligning them with the corresponding stage indices (1, 2, 3); (6) Reconstructor: Reshape the 1D vector into \(P \times P\) label map.

3 The Monitoring System

To facilitate oyster mushroom growth monitoring, we introduce a comprehensive system illustrated in Fig. 2(a), comprising four key components: image acquisition, cloud storage, label map, and applications. Initially, panoramic oyster mushroom images are captured via a camera and transmitted to cloud storage via WIFI. Subsequently, these cloud-stored images are processed through a supervised image classification method, referred to as the “label map”, which selects an optimal model. Finally, this model is leveraged for applications. Given the unique challenges posed by panoramic images, as discussed in Sect. 3.1, we employ label map for panoramic monitoring, with detailed insights provided in Sects. 3.2.

3.1 The Problem

This study emphasises handling complex images that closely resemble real-world scenarios captured by cameras. In our scenario, oyster mushrooms are cultivated within bottles arranged on various tiers of shelves. These intricate panoramic images (shown in Fig. 1(a)) introduce some challenges in image recognition. Due to the limited perspective of the lens, occlusion becomes a challenge because only a portion of the front-row bottles is captured, while those positioned behind remain concealed from view. Also, the cultivation environment incorporates both natural and LED lighting, posing an additional challenge in terms of illumination. Moreover, viewpoint variation presents another challenge, given that the three shelves are arranged in a left, middle, and right configuration. The toughest challenge is oyster mushrooms can span across three distinct growth stages, further complicating the task of accurately identifying each stage within a panoramic image. Due to the large amounts of bottles and the closely spaced arrangement of them, identifying a bottle of mushroom become difficult. Thus we propose label map as a solution to automatically identify the mushroom stages. We first split the panoramic images into patches, and then based on the split images, we classify them into different stages and reconstruct the classification results to automatically monitor the panoramic images.

3.2 The Label Map

Instead of obtaining the stage information of individual mushroom bottles, using label map method can achieve global modelling by integrating of the stage information from patches. As depicted by Fig. 2(b), a panoramic RGB image (\(H \times W \times 3\)) can be divided into \(P \times P\) patches (\(\frac{H}{P} \times \frac{W}{P} \times 3\)) by auto-cutter. These patches are arranged from left to right, and then top to bottom. Because the patches include positional information of the original images, the sequential order of the patches is important. Within these \(P \times P\) patches, each patch exclusively corresponds to a single growth stage rather than encompassing all three stages. Then, these patches are passed through feature extractor and classifier sequentially. Each patch will generate a corresponding label based on the probability outcome. After concatenating these labels, the output (\(1 \times P^2\)) will be a label representative of the original image. To match the original stage indices, add 1 to the output. The addition will not increase the model complexity. Reconstructor reshapes the output into a \(P \times P\) grid which yields a label map that effectively delineates the growth stages present within the panoramic image. A related issue with this method is that within a single patch, it’s possible to encounter a combination of two stages, typically a mixture of stage one and stage two. To address this challenge, during the ground-truth labeling process, the mushroom company staff assigns a label to only one stage based on either the majority of mushroom stages within the patch or by considering the misclassification cost and assigning it to the stage with the lowest cost.

To measure the distance between the true growth stage probabilities \(\textbf{y}\) and predicted growth stage probabilities \(\mathbf {\hat{y}}\), this paper uses multi-class cross entropy loss function shown in the Eq. (1) where M is the number of panoramic samples, C is the number of classes and \(\mathbf {\hat{y}}_{i}\) is the predicted probability of a specify class j to the sample \(\mathbf {x_i}\).

$$\begin{aligned} CE(\textbf{y}, \mathbf {\hat{y}}) = -\sum _{i=1}^{M} \textbf{y}_{i} \log (\mathbf {\hat{y}}_{i}) = -\sum _{i=1}^{M}\sum _{j=1}^{C} y_{ij} \log (p_{\theta }({y}_{ij}|\textbf{x}_{i})) \end{aligned}$$
(1)

To minimize the loss, our primary objective shifts towards identifying an optimal feature extractor and classifier within the label map method. This optimal model needs to be robust against challenges such as occlusion, varying illumination, and changes in viewpoint. In this paper, we explore existing machine learning and deep learning-based image classification algorithms to find the optimal feature extractor and classifier for our label map method.

Machine Learning Based Image Classification Algorithm. We first explore the classic machine learning solution, in which we use scale invariant feature transform (SIFT) technique [9] to extract features and then we apply support vector machine (SVM) [2] to classify the patches. For M panoramic RGB images, a total of \(M \times P \times P\) RGB patches can be obtained. These patches constitute a dataset, which is subsequently divided into training and testing data. To obtain SIFT features, we transform the RGB patches from three dimensions to grayscale images with two dimensions, as The SIFT technique [9] handles grayscale images. For each patch, SIFT [9] employs various levels of the Gaussian pyramid, in which multi-scale patches apply Gaussian smoothing and downsampling, to detect the key points. Subsequently, SIFT [9] computes gradients within a \(16 \times 16\) window centred on an identified keypoint, generating an orientation histogram in vector form to construct a keypoint descriptor, thus creating SIFT features. Next, we apply k-means clustering to establish the visual vocabulary (Bag of Features) using the training SIFT features. Then, we associate each SIFT descriptor of a patch with the closest visual word in the BoF vocabulary and create a visual word histogram for the patch. Later, we combine these BoF histograms into a unified feature matrix for both training and testing purposes. Finally, we use the SVM classifier [2] to find the hyperplane that maximally separates three classes while aiming to minimize classification errors.

Deep Learning Based Image Classification Algorithm. In terms of deep learning based image classification algorithms, this paper used pretrained VGG-16 [18], ResNet18 [4], ResNet34 [4], ResNet50 [4], MobileNetV2 [16] and ViT-B-16 [3]. All the models are pretrained on ImageNet-1k dataset, and then fine-tuned on Oyster Mushroom dataset. \(P \times P\) patches from a panoramic image can be set as a batch of the whole \(M \times P \times P\) dataset to the model, which eliminates the need for any further image processing and does not increase the computation complexity. VGG [18] employs sequential \(3 \times 3\) convolution/ReLU blocks and \(2 \times 2\) max pooling, progressively increasing the channel count from 64 to 512, and culminating in three fully connected layers, yielding networks with 16–19 layers. ResNet [4] works by first passing a patch image through a convolutional layer that detects the basic features like edges and corners, then feed forward through several residual blocks. These blocks consist of multiple convolutional layers with shortcut connection, which performs identity mapping and add the result to the output of stacked layers. Finally, ResNet [4] ends with a global average pooling and fully connected layer to classify the input. In contrast to VGG [18], ResNet [4] has a similar structure but is significantly deeper, ranging from 18 to 152 layers, due to its utilization of direct connections across convolutional layers, addressing the accuracy degradation problem when layers increase. MobileNetV2 [16], begins with an initial convolutional layer for extracting low-level features from a patch image, followed by seven bottleneck residual blocks with varying strides. Each block involves a combination of pointwise and depthwise convolutional layers to capture spatial information while managing computational efficiency. The architecture concludes with a \(1 \times 1\) convolutional layer, a global average pooling layer, and a fully connected linear classifier with dropout to classify the input. ViT-B-16 [3] transforms an oyster mushroom patch image by converting it into a sequence of 2D \(16 \times 16\) patches, then flatten the 2D patches and process through a linear projection layer. Position embeddings and an additional class token are incorporated, then forward to multiple Transformer Encoder which has a self-attention mechanism for global context information. Lastly, the extra class token is fed through an MLP Head (two-layer classification network) to predict the stage.

4 Empirical Studies

4.1 Settings

For image acquisition in the monitoring system, images for this research were collected from two different shipping containers at a small mushroom company, where oyster mushrooms were cultivated under controlled environmental conditions (temperature: 18–22 \(^{\circ }\)C, humidity: 70–90%, \(CO_2\) levels: 800–1500 ppm). The shipping containers were illuminated with RGB LED strip lights. The panoramic images were captured using a Tapo C310 IP camera connected via WIFI. Due to the unstable WIFI signal, the camera often went offline, resulting in the inability to capture images. Also, variations in natural and LED lighting conditions can lead to image blurring. Moreover, considering the approximately 14-day life cycle of oyster mushrooms, detecting growth changes occurring within 1-hour intervals proves challenging for human observations. Due to previous concerns, our current dataset is approximate to the patches divided from panoramic images. Images from this dataset were captured using an iPhone 11 in the high-efficiency HEIC format, featuring a resolution of \(4,032 \times 3,024 \) pixels and utilizing the RGB colour space. The images were captured within different distances between the lens and the samples. The dataset size is a balanced dataset with 150 images so far (we are continuing the image capturing). We used 70% for training and 30% for testing.

4.2 Performances

After training and testing the model, we used Accuracy, Macro Precision, Macro Recall, Macro F1 and Macro Area Under the Receiver Operating Characteristic Curve (AUC-ROC) for evaluation.

Table 2. Model Performances. Acc. denotes accuracy, Pre. means precision and Rec. represents recall.

As Table 2 shows, the deep learning-based feature extractors and classifiers surpass the traditional methodology SIFT-SVM. Even the least performing deep learning-based model exhibits superior results compared to the SIFT-SVM model. One factor is that images have RGB channels, which have three dimensions. However, SIFT-SVM converts colourful images into grey-scale images, resulting in the loss of valuable information by dimension reduction. Another factor lies in transfer learning because the pretrained models have learnt rich features from other huge datasets, they already have useful information, and fine-tuning helped inject domain knowledge.

Among the various deep learning models, VGG-16, ResNet18, ResNet34, ResNet50 and MobileNetV2, the variants of CNNs demonstrate higher accuracy compared to transformer-based ViT-B-16. Essentially, the inherent inductive biases of CNNs, such as translation equivariance and locality, outperform the self-attention mechanism of ViT-B-16 on this small dataset. When features are extracted at an earlier layer, the translation equivariance principle makes sure the neural network’s response remains consistent for the same image patch, regardless of its position. And the locality principle makes sure the network focuses on local regions, without paying attention to the distant regions. As channel numbers and layer depth increase, the features capturing local information are aggregated to make predictions. Hence, CNNs have the ability to capture fine-grained image details, which are crucial due to the subtle differences in both intra-class and inter-class variations. ViT-B-16 acquires these inherent biases by training on a large dataset.

Fig. 3.
figure 3

ROC curve at One-vs-One Multiclass Classification

Among the CNN variants, VGG-16 stands out as the top performer with an accuracy of 82.22%, slightly surpassing ResNet50 in terms of macro F1 and macro ROC scores, 82% and 92.07% respectively. In our case, diverse impacts of misclassifications incur varying costs. For example, misclassifying stage two as stage three bears fewer adverse consequences due to their shared importance, warranting harvest. Conversely, misclassifying stage three as stage two carries greater repercussions, given the diminished quality, loss of nutrients, and reduced selling potential, underscoring the significance of accurate classification. Furthermore, misidentifying stage one as stage two not only squanders labour time but also poses challenges in labour scheduling and rearrangement. Due to the diverse costs associated with different types of misclassifications and the discriminative ability between minute intra-class and inter-class variances, both macro F1 and marco AUC-ROC are crucial metrics. The macro-F1 score represents the harmonic mean of precision and recall across all stages of mushroom growth. It takes into account both false positives and false negatives for each individual stage, providing a balanced assessment of the model’s performance across all classes. Thus, a higher macro-F1 score is important as it indicates a better balance between accurately identifying different stages of mushroom growth while minimizing the overall misclassification rate. Additionally, the Receiver Operating Characteristic (ROC) curve (depicted in Fig. 3) is a graphical representation that illustrates the trade-off between the true positive rate (sensitivity) and the false positive rate (1-specificity) as the classification threshold for each stage is varied. The ROC curve helps assess the model’s ability to discriminate between different stages of mushroom growth by showing how well it can distinguish between positive and negative samples for each class. The area under the ROC curve (AUC-ROC) is often used as a quantitative measure of the model’s overall performance, where a higher AUC-ROC value indicates better discriminatory power and accuracy in distinguishing between different growth stages of oyster mushrooms. As Fig. 3 shows, the VGG-16 classifier effectively discriminates between stage one and three with an AUC score of 1.00, while distinguishing between stage two and three proves to be the most challenging with an AUC score of 0.85. This observation is logical, given that the delicate pin-like structures of the mushroom during stage one are distinctly different from the matured mushroom in stage three. In contrast, stage two is close to the maturation process of stage three. Therefore, we select VGG-16 as our optimal feature extractor and classifier for our dataset.

5 Conclusion and Future Works

In conclusion, this study introduces the Oyster Mushroom dataset, encompassing three distinct growth stages of the oyster mushroom, and addresses the challenge of classifying panoramic images through the innovative use of a monitoring system. Our experimental results indicate that the label map method within the monitoring system exhibits remarkable performance, achieving an accuracy of 82.22%. In future endeavours, we aim to delve into alternative approaches for predicting harvest timing, including treating growth stage images as time series data and leveraging techniques such as regression or recurrent neural networks.