Early detection of weed in sugarcane using convolutional neural network

Weed infestation is an essential factor in sugarcane productivity loss. The use of remote sensing data in conjunction with Artificial Intelligence (AI) techniques, can lead the cultivation of sugarcane to a new level in terms of weed control. For this purpose, an algorithm based on Convolutional Neural Networks (CNN) was developed to detect, quantify, and map weeds in sugarcane areas located in the state of Alagoas, Brazil. Images of the PlanetScope satellite were subdivided, separated, trained in different scenarios, classified and georeferenced, producing a map with weed information included. Scenario one of the CNN training and test presented overall accuracy (0,983), and it was used to produce the final mapping of forest areas, sugarcane, and weed infestation. The quantitative analysis of the area (ha) infested by weed indicated a high probability of a negative impact on sugarcane productivity. It is recommended that the adequacy of CNN’s algorithm for Remotely Piloted Aircraft (RPA) images be carried out, aiming at the differentiation between weed species, as well as its application in the detection in areas with different culture crops.


Introduction
The weeds compete with crops for resources such as light, nutrients, water and space. When in the phenological stage of senescence, weeds produce thousands to hundreds of thousands of seeds that can survive for a long time, posing a major threat to crop productivity (Liang, et al., 2019).
Visual assessments of weed species coverage are carried out for their detection in sugarcane stands, and it is possible to group them according to the degree of infestation. However, phytosociological assessments can be performed with better precision using multivariate statistical techniques (Ferreira, et al., 2011).
Remote sensing has shown to be a technique for mapping weeds in agricultural crops and pastures, multispectral images with a greater level of spatial detail have been used (Sartori, et al., 2009;Rao, 2008).
In the last decade, Deep Learning tools have caused a real confusion in the area of computer vision. Some of the main reasons for this are: availability of databases with thousands of images and computers capable of reducing the time to process these databases (Ponti and Costa, 2017).

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol. 10 No. 11 (2022), pg. 211 Neural Networks are composed of several neurons and they contain synaptic weights where the knowledge acquired through training applied to them is stored (Haykin, 2001).
Among the Deep Learning techniques, the Convolutional Neural Network (CNN), demonstrated a higher precision when comparing with other existing techniques of image processing, being applied to solve several problems of high complexity in agriculture such as classification and prediction, effectively used when there is a high quality and quantity of the dataset for the training the model (Kamilaris and Prenafeta-Boldú, 2018; Weiss, et al., 2020;Zhang, et al., 2018).
From an agricultural dataset consisting of 20,000 images of seedlings of the cultivar and weeds obtained by a robot in the field, at different phenological stages and angles of the photos, using image processing it was possible to apply an R-CNN to the detection of cultivars and weeds reaching an optimal accuracy (Quan, et al., 2019;Shah, et al., 2021).
Comparisons of Deep Learning models demonstrated greater speed and accuracy in the application of R-CNN to identify weeds in plots of cultivated plants (Espinoza, et al., 2020.).
Despite the phenotypic similarity, a CNN can be applied in RGB images, allowing the distinction between cultivated crops and weeds in the same field, reaching an accuracy of 0,893 (Xu, et  A CNN can be applied to detect weeds from images obtained in loco in controlled environments with cultivated culture, distinguishing the different types and obtaining an accuracy of 0,9944, that is, very close to a real evaluation performed by a field expert (Haq, 2022).
Recognition of weeds and their distinction in relation to crops using CNN allow reaching up to 0,9937 accuracy (Jiang, et al., 2020;Burks, et al., 2005).
Remote sensing has been widely used in agriculture given its vast possibility of producing and analyzing spatial data for mapping crops, yield forecasting, among others, and currently, using Deep Learning techniques for real-time processing of these data that are stored in the cloud (Weiss, et al., 2020).
From PlanetScope satellite images, a large database of high spatial resolution images was generated, and through the application of Machine Learning and CNN algorithms, it was possible to classify areas of fire, presenting high accuracy (Oliveira, 2019). Therefore, the objective of this work is to develop an algorithm based on Convolutional Neural Networks (CNN) applied to satellite images, for the detection and mapping of weeds in sugarcane planted areas in the state of Alagoas.

Study area
Study area is located in the eastern mesoregion of Alagoana, and covers the municipalities of Rio Largo and

Hardware, software and data
The hardware set used in this work for the development of the CNN were: a) Intel i5-8265U processor, 1.6 - The following georeferenced data were used: a) vector files (SHP and KML); and b) matrix files (GEOTIFF).
LibreOffice software was used for statistical analysis of confusion matrix, accuracy and Kappa coefficient.

Methodology
Methodology followed the steps described in the workflow below ( Figure 2).

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol.

Data acquisition
Freely available digital and georeferenced data sources were accessed via the internet. Initially, research was carried out to obtain vector files that represented the official limits of the municipalities of Maceió and Rio Largo, within the state of Alagoas, Brazil. For this, the IBGE Maps Portal was used, from which the al municipios.zip file was downloaded, containing vector data (SHP) referring to the 2018 municipalities mesh of the Alagoas state (IBGE, 2020).
The vector data produced by the Santa Clotilde mill with information on the boundaries of stands and farms were provided in the Kml format. With the Qgis software, it was possible to convert the Kml data to the SHP format, which is the standard format for using this software.
Images were obtained from the PlanetScope satellite covering the study area, in addition to a large part of the entire sugarcane planted area used by the mill. These images are available already orthorectified, composed of four multispectral bands and have a spatial resolution of 3 meters. For this work, five overlapping scenes covering the study area were selected, one from 08/22/2020 and four from 06/28/2020, acquired free of charge, through an agreement between the Education and Research Program and the company through the education and research program of the Planet platform.

Georreferenced database
Therefore, two georeferenced databases were created: a) vector database with shapefiles; and b) matrix database with geotiff files.

Database pre-processing
In this step, the QGIS software was used to create the mosaic with the PlanetScope satellite scenes, considering the imaging date of 08/22/2020, using the Miscellaneous tool. From the mosaic, the raster cut tool was used to separate two areas of sugarcane plantation, one for extracting samples for training and tests, and the other for classification, from the CNN. To cut out the classified or study area, the vector file provided by the Santa

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol. 10 No. 11 (2022), pg. 214 Clotilde mill was used. With the same raster data clipping tool, samples of the classes of interest (sugarcane, forest, weed and undefined) were extracted from the cropped image for training and testing ( Figure 3). The undefined class was used to remove the edge of the image that does not include the area of interest for the CNN application.

Sample data
A Python script was developed to standardize the samples that served as input to the algorithm. This standardization made it possible to scale the image pixels in a 5x5 size, which generated an increase in the number of samples. That is, from the division of 11 samples, with irregular sizes, in the proportion 5x5, 17185 subsamples were created from an image with a spatial resolution of 3m, and each subsample covering an area of 225m2 ( Figure 4). The subsamples were separated into 4 folders of different classes, which made it possible to analyze the Sugarcane classes with 3694 subsamples, weeds with 3318 subsamples, Forest with 3393 subsamples and Undefined with 6780 subsamples (Table 1).

Training and testing
Four scenarios were tested with different types of parameterization to analyze the best result of CNN training and testing accuracy ( Table 2). They were used the following types of scenarios parametrization: -number of epochs: Scenario 1 was the only one that used 50 epochs and the other scenarios, 30 epochs; -standard parameterization: The second was the first original scenario, that is, it was considered a standard to -proportion of subsamples: The fourth scenario was the only one that used the proportion of 75% of the subsamples for training and 25% for testing. The others, 50% -50%; -number of convolution layers: The fourth scenario was the only one that used four convolution layers. The others, one; and -Input size: The fourth scenario was the only one that used 50x50 pixels images of input size. The others used 5x5.

Classify
The classification was made based on the classify algorithm for the scenario that presented the best accuracy.
The objective of using this algorithm was to represent the subsamples of each class, identified and separated, through the image resulting from the training made by CNN, being possible to distinguish each class through a geometric shape, similar to a map, with different colors.

Georeferencer
The geometric shape from the classification is not geo-referenced, and therefore, it was necessary to develop the georeferencer algorithm. To georeference the geometric shape, the parameters of geographic coordinates of the image of the study area were used, which served as an input layer. With the application of the Rasterio library, the geometric shape underwent an inversion of its bands and then the image of the study area

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol. 10 No. 11 (2022), pg. 217 transferred its metadata to the geometric shape, thus creating a georeferenced matrix map (Geotiff), representing the identified classes.

Equations
The analysis of Kappa coefficient (Equations 1, 2 and 3). Where: ρ o is the relative acceptance rate. ρ e is the hypothetical acceptance rate.
When the agreement is total between the data sets k = 10.

Analysis of CNN application scenarios
The first scenario presented better CNN accuracy with 0,983. The confusion matrix indicated that all 1847 Sugarcane class subsamples were correctly classified and there was no confusion with the weeds, Forest and N classes. For the training of the weed class, it was observed a confusion with 251 subsamples with the Sugarcane class and 1408 subsamples were correctly classified and there was no confusion with the Forest and N classes. Regarding the Forest class, 1696 subsamples were correctly classified and there was confusion with the N class for only one subsample, and there was no confusion with Sugarcane and weed classes. Class N had all its subsamples correctly classified (Table 3).

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol. 10     representing accuracy, tend to stabilize close to 1 and those representing the loss tend to stabilize close to 0.

Comparison of CNN and Kappa coefficient accuracy
Comparing the accuracy data and the Kappa coefficient, the highest discrepancy value 0.035 between them was observed in scenario 3, where the accuracy presented a value of 0.938 and the Kappa index of 0.903. The best values for both Accuracy and Kappa coefficient occurred in scenario 1, with a discrepancy value of 0.014 (Table 4).

Early detection of weed in sugarcane using convolutional neural network
International Journal for Innovation Education and Research Vol. 10 No. 11 (2022), pg. 220

Geometric form and georeferencing
The classify algorithm was used to generate a geometric form of the identified classes during the application of the CNN algorithm for training and testing the subsamples. The resulting geometric shape had the N class removed for a better visualization of the limits of the sugarcane (light green), weed herb (red) and forest (dark green) classes. Then, the georeferencer algorithm was used to georeference the geometric shape, transforming it into a map (Figure 8).

Classes area
From the map, it was possible to quantify the areas (ha) for each land use class within the Riacho de Pedra farm, reaching the following values: 272.5 ha for Sugarcane, 97.8 ha for weed and 80.6 ha for Forest. A weed infestation equivalent to 21.7% of the total area analyzed was detected, which should be covered by Sugarcane (Table 5 and Figure 9). Table 5: Area values (hectare) and percentage.

Analysis of CNN application
This article analyses the use of artificial intelligence techniques, but specifically deep learning, to identify essential problems in agricultural management. Within the deep learning line, an algorithm based on convolutional neural networks (CNN) was used, which can be trained for a particular application, but requires a large set of data from images. For this purpose, images from the PlanetScope satellite, with high spatial resolution, were used, from which samples were taken to train the CNN algorithm to identify weeds in Sugarcane plantation areas.
Alagoas has large areas of sugarcane plantations, as historically the state is among the largest producers of sugar and alcohol in Brazil. Therefore, the rapid identification of the presence of weeds can enable and anticipate the realization of cultural treatments to control these vegetative species. In this way, the chances of obtaining high agricultural productivity increase and the amount of chemical herbicides to be applied to control weeds is reduced. Because there are large areas of planted land, large Sugarcane producers need to carry out a pre-application using piloted agricultural planes, which causes a great environmental impact in the surroundings of the applied areas.
Therefore, early detection of weeds at an early stage can avoid this type of application and minimize environmental impacts. A scene from the PlanetScope satellite image covers an area of approximately 40,000 hectares, being quite suitable for the application for large producers that manage more than 30,000 hectares of Sugarcane. That is, in a single scan of the algorithm over this scene, it would be possible to detect the presence of weeds in a large area planted with Sugarcane, in the state of Alagoas.
Comparing these problem detection methods with traditional methods, through field visits carried out by employees using motorcycles and cars, the time for evaluation, quantification and decision making will decrease considerably. In addition to reducing the cost of fuel, parts and labor to repair these vehicles.
The algorithm, having reached a minimum accuracy required (0.983) using image samples from a given area with Sugarcane plantation, can be applied in other areas with high chances of success for weed identification.
In the case of climatologically different environments and with different soil and relief conditions, there will be a high chance of occurrence of weed species/families different from those that were sampled for training the algorithm. In this case, it will be necessary to expand the dataset with samples of these new species/families and perform a new training of the CNN algorithm.
In allows the algorithm to be used in the management of weed control, reducing the frequency and amount of herbicide application and, consequently, the degradation of the environment (Yu, et al., 2019). In addition, this type of control avoids significant losses in agricultural productivity and quality (Jin, et al., 2022).
All the scenarios presented superior accuracy values for CNN algorithm comparing to Kappa coefficient. Regarding the limitations of this study, we can mention as the most important climatological issues involving the frequency and amount of rainfall, and consequently, the presence of clouds, which make the use of PlanetScope satellite images unfeasible. The months between May and August are especially problematic in the state of Alagoas as they include the winter season, characterized by a mild temperature and high rainfall density.
Considering that the entire area planted with Sugarcane in the state of Alagoas is concentrated in the coastal region, it happens that even in other periods of the year without latent rainfall, the presence of clouds is observed, which reduces the amount of area for viewing ground targets and collecting samples for training the CNN algorithm.
As future perspectives, we intend to use RPA images and, if possible, images from the SAR (synthetic aperture radar) satellite.
Imaging via RPA would have the advantage of choosing a day and time when there is no rain, unlike PlanetScope satellite images, which have predefined days and times for imaging at a given location. In the case of advancing in obtaining more detailed information using CNN algorithms, such as identification of the species/family of weeds in Sugarcane plantations, the images obtained by RPA present a more adequate spatial resolution when compared to PlanetScope images.
As for SAR images, the advantages are that, due to the prevailing weather in Alagoas with up to four months of cloudy skies per year, these satellites are able to collect images day and night, and in any weather condition (rain). Furthermore, each scene covers an area greater than 50,000 km2 and has a spatial resolution that reaches less than 1 meter. These images are really ideal for use in sugarcane plantation areas in the state of Alagoas, as they can monitor large areas, obtain detailed information through CNN algorithms and collect images even with rain and clouds.

Conclusion
The CNN algorithm showed a high accuracy when applied to hight spatial resolution images (PlanetScope Satellite), when identifying and analyzing weed infestation in sugarcane plantation areas. When comparing different scenarios for the application of the CNN algorithm, it was observed that the first scenario was more prominent with an accuracy value of 0,983. That is, the algorithm was able to achieve a high level of learning and success in identifying weeds from a PlanetScope image. Regarding the comparisons of the results obtained for the parameters of accuracy and Kappa coefficient, of the analyzed algorithms, it was observed that the CNN accuracy values presented better results than the values of the Kappa coefficient. Without, however, discarding the results obtained by the Kappa coefficient, which reached a perfect agreement for all scenarios.
In the quantitative analysis of the area infested by weed, it is concluded that there is a high probability of a negative impact on the productivity of sugarcane at Fazenda Riacho de Pedra, since in a planted area of 450.9 Early detection of weed in sugarcane using convolutional neural network

Acknowledgement
The manuscript was partially funded by PIBIC (Program of Scientific Initiation Scholarships) and PIBITI (Program of Initiation Scholarships in Technological Development and Innovation) of the University Federal of Alagoas (UFAL).