Keywords

1 Introduction

Food and fiber are mainly produced via agriculture. Agriculture provides all the world’s inhabitants with the nutrition it needs. Also vital to the economy is agriculture. A significant share of the global workforce is employed in the agriculture sector. 50% of the Indian population is employed in the agriculture sector. The economic survey estimates that agriculture’s share of the GDP for the fiscal year 2020–2021 was 19.9%Footnote 1. The population has increased exponentially over the past three decades (almost 10 billion by 2050, according to a U.N. study), which has led to a sharp rise in food demand [1]. To feed such a vast population, conventional food production methods are insufficient.

Hence, there is a need for innovative and intelligent ways of farming. Smart and Precision farming is one of the best solutions to meet the current food demands of the large population. Precision agriculture provides the automation of various agriculture practices, which reduces time and labor costs. Artificial intelligence(AI) and IoT (Internet of things) are advanced technological tools that help to digitize agricultural activities. To cut waste, boost revenues, and protect the environment, precision agriculture manages each crop production input (water, fertilizer, herbicide, seed, pesticide, etc.) on a site-specific basis. (Ess & Morgan 2013). Precision agriculture has a wide application area. It helps in crop management, weather forecasting, weeds, and pest control, intelligent spraying, livestock farming, remote sensing, storage management, innovative harvesting, etc. It automates almost all agriculture practices. It helps to detect weeds and pests in crop fields and provides site-specific spraying of pesticides and herbicides to remove weeds and pests.

Weeds are a fundamental problem that affects crop yields to a large extent. Weeds are unwanted plants that compete for nutrients, water, and other resources with valuable plants. These unwanted plants are always needed to be removed from fields. The type of weeds depends upon the location, season, and crop. Weeds vary from country to country. So, the kind of weeds present in Germany’s land does not need to also be present in the land of India. There are many datasets available from various countries. But from the Indian perspective, there is a lack of weed/crop datasets.

So in this paper, we provide a thoroughly annotated and masked crop/weed dataset from potato fields. The dataset contains 270 images manually annotated using VIA (VGG Image Annotator) [30]. The annotations made available with this dataset enable the development of weed detection and classification solutions and many types of image processing, including edge detection, motion detection, and noise reduction. The information presented is crucial from a computer vision standpoint. On the one hand, the process of picture collecting in the agricultural industry is challenging since it necessitates sophisticated hardware systems, access to fields, and lightning conditions, and the timing of the acquisition must be accurate and linked with the crop growth cycle (only once a year for many cultures). On the other hand, defining appropriate ground truth requires the assistance of agricultural professionals [2].

The dataset comprises field images in a top-down view that were acquired with a Sony CyberShot W830 20.1 M and mobile phone camera. The images are collected from the potato fields of Punjab Agriculture University (Precision farming) fields in Punjab, India. The crop was photographed at a stage of development where many genuine leaves were visible. The manual weeding was done in this field after a few hours of data collection. Here, we focus on potatoes, but wheat, peas, onions, and other cultivars also need manual weed control procedures. Every image has annotations, and the dataset contains crop/weed annotation JSON file, CSV file, Coco format file, and annotation mask for each image. The dataset is available online at https://www.kaggle.com/datasets/rajni88/indianpotatoweed-dataset Fig. 1 provides sample images from the dataset.

Fig. 1.
figure 1

Sample images from dataset

2 Literature Survey

In general, there is a lack of open datasets accessible by researchers and academicians. Data sets are like the food for classification and detection problems in machine learning models [3]. These technologies are used in a variety of agricultural fields, including crop disease detection, weed classification and identification, plant seedling classification, fruit identification and accounting, management of water resources and soil, weather forecasting (climate) [3,4,5,6]. Accurately classifying and detecting weed species in their natural environment may be the most significant barrier to the general adoption of robotic weed management.

Table 1. Public Crop/Weed datasets
Fig. 2.
figure 2

The polygon annotation of images. The yellow color specify crop, and the blue color specify weed.

The more data included in these databases, the more effective artificial intelligence systems can govern robotic weed growth, provide more accurate plant growth, and allocate scarce resources.Potato/weed dataset [8]is an open-access dataset having 411 images taken from potato fields. But this dataset contains separate images for crop and weed and cannot be used for segmentation problems. It is valid for classification problems only. Another dataset for weed detection has 202 images [9] that can be used for classification problems in deep learning. Another dataset named cwfid [10], having 60 shots, is available on GitHub for crop /weed classification and segmentation for computer vision in precision agriculture.

Sudras et al. [11] annotated 1118 images having six food crops and eight weed species from different locations in Latvia. DeepWeeds [12] is an extensive dataset having 17,509 images taken from different crop fields in Australia. Table 1 represents the various datasets available online from fields of other countries for different crops.

This paper aims to provide a real-world image dataset for image segmentation and classification model like Faster Region-based Convolutional Neural Network (RCNN) and Mask RCNN. This enables researchers to acquire research on the perception of data acquisition and treatment for weeds in potato fields.

3 Problem Description

Data presented in this paper shows how the dataset is distributed among food crops and weeds. The crop selected for this work is potato. Two hundred seventy images presented in this paper are manually annotated using the VGG image annotator (VIA) tool. The dataset is split into train and val folders containing 80:20 images. Each folder contains the JSON file having annotations. Raw images and mask for each image is also included in the dataset. Figure 3 displays images from a dataset with polygon annotation with yellow color specifying crop and blue color specifying weed (Better visible in color image).

Table 2. Extent of dataset

4 Material and Methods

4.1 VIA (VGG Image Annotator)

VGG Image Annotator (VIA) is an easy-to-use standalone program for manually annotating images, audio files, and videos. There is no setup or installation needed with VIA; it simply runs in a web browser. The complete VIA program is included in a single self-contained HTML page that is less than 400 kilobytes and works as an offline application in most modern web browsers [30]. Using the VIA tool, we have annotated the images. We have also classified images into weed and crop categories, shown in Fig. 3. The region shape used for annotation is the polygon. The total number of annotations is 776, of which 393 are crop annotations and 383 for weed. The extent of the dataset is represented in Table 2.

4.2 Masking

A mask allows us to focus only on the portions of the image that interests us. It can be defined as setting specific pixels of an image to some null value such as 0 (black color). So, only that portion of the image is highlighted where the pixel value is not 0. In this program, we begin with reading the image using the cv2.imread() function in python. Then we convert the image to HSV format as all the operations can only be performed in HSV format.

During masking, the images can be segmented into background and foreground. Figure 3 shows the mask and the masked image from the dataset.

Fig. 3.
figure 3

Annotated and masked images into crop/weed using VIA tool and python (a) & (b) mask of images (c) & (d) masked images

4.3 Field Setup and Acquisition Method

The 270-image dataset was captured at a precision agriculture potato farm in Northern India in December 2022 before manual weeding was applied. The potato plants were grown in a single row on small soil beds. Small close-to-close intra-row weeds were present at data acquisition time. Sony CyberShot W830 20.1 MP and mobile cameras captured the images in an unregulated environment. During data collection, the weather was clear, with no clouds. Specifications of the dataset are provided in Table 3.

Table 3. Specifications of Dataset

5 Work Flow

Sony Cyber-shot cameras and mobile devices were initially used to capture the raw photos. There were 600 pictures altogether. The data were cleaned to eliminate duplicate photos, blurry images, and noise. After cleaning, 270 images in total were collected. The data were divided 80:20 between train and val folders. Using VIA Annotator, each image was manually annotated. The annotation tool exported JSON and CSV files. We manually constructed a mask for each image and used Python to mask each image. The files were all uploaded to https://www.kaggle.com/datasets/rajni88/indianpotatoweed-dataset.

Fig. 4.
figure 4

Work Flow Diagram of the Proposed Approach

6 Value of the Data

  • The dataset presents images of potato crops and weeds in their early growth stages, which can be used by agronomists and researchers in different fields for computer vision and smart farming.

  • The open-access dataset can be used for weed recognition and segmentation algorithms.

  • The dataset can train, test and validate convolutional neural networks(CNN) models.

7 Conclusion

A potato crop and weeds dataset for addressing the weed issues in precision agriculture is collected, masked and posted on Kaggle. The images of crops and weeds are acquired using a Sony digital camera and mobile camera in Punjab, India. During the collection of data, there were inter and intra-row weeds were present in the field. The images are manually annotated using VIA (VGG Image Annotator)Tool. There are a total of 270 images in the dataset divided into train and val folders.

This dataset can be used for weed detection, segmentation, and classification problem. We hope this will help increase the progress in the required data acquisition domain and generate ground truth. It will help researchers and agriculture experts to develop ground truth of weed management. In the future, this dataset can be extended with more images from different regions in different seasons and growth days.