A Model for Detecting Fungal Diseases in Cotton Cultivation using Segmentation and Machine Learning Approaches

— This research detailed a model for detecting fungal diseases via techniques for processing images of cotton leaves. The work allowed to develop a model based on the set of preprocessed data, to formulate the developed model, to simulate and evaluate the model. It is about detecting fungal diseases in cotton cultivation. The image data records were collected in an online data repository consisting of images of cotton leaves infected with fungal diseases and normal leaf images. In addition, other images of infected and uninfected cotton leaves were collected in cotton production fields in the Ségbana region in Benin Republic. The model was formulated based on watershed segmentation technique by applying Edge Detection algorithm and K-Means Clustering; and Support Vector Machine (SVM) for classification. The simulation was done using MATLAB with Image Processing Toolbox 9.4. The results gave an accuracy of 99.05%, specificity 90%, misclassification rate 0.95%, recall rate 99.5% and precision 99.5%. In addition, with less computational effort and in less than a minute, the best results were obtained, showing the efficiency of the image processing technique for the detection and classification of infected and uninfected leaves. It was concluded that this approach was applied to detect fungal diseases on cotton leaves to promote the production and harvest of good quality cotton and valuable cotton products.


INTRODUCTION
Cotton is a soft fibrous substance, usually white, composed of the hairs encompassing the seeds of various tropical plants with free branches (genus Gossypium) of the mauve family [1] [2]. The ordinary cotton name comes from the Arabic "quotn" and generally refers to plants that produce spinnable (plush) fibers on their integument [3].
Cotton cultivation improved the living conditions of rural and urban populations. It contributed to the economic development of the CFA franc zone countries in Africa. Production of cotton has quadrupled in the last two decades, making the region the world's second largest cotton exporter, accounting for 15% of global exports. As a result, generating the most extensive receipts in various regional countries being the main cash crop for government revenues. Cotton cultivation has impacted the poverty rate positively by employing above two million rural households [4].
Due to fungal disease problems in cotton growing, agricultural industries face economic losses and a qualitative and quantitative decline in annual production. Nowadays, modern cotton industries are trying to ensure the quality and safety of products to provide healthy cotton to consumers.
Africa is an agricultural zone where most of the population is based on cotton farming. Farmers have a wide selection of diversity to pick appropriate cotton crops. However, the growth of these plants for greatest yield and quality products is very technical. It can be enhanced with the support of technology. The management of cotton crops needs careful observation, particularly to treat diseases that will affect production and, consequently, the life after harvest.
A disease is an alteration of one or more physiological processes caused by irritation from some factors or agents resulting (pests) in loss of coordination in plants. Plant disease(s) is a hurtful modification from the traditional functioning of the process related to physiology in a particular plant. Plant diseases generally can be classified into three major parts: Classification based on the localization area, classification based on the occurrence, classification based on the causal agent [5].
Plant disease problems have arisen as a result of a considerable drop in the quality and quantity of agricultural output. Plant disease losses in Georgia (United States) in 2007 were estimated to be over $539.74 million. Approximately $185 million of this total was spent on disease control, with the remainder being the result of disease damage [6]. Several pests, including bacteria, aphids, insects, fungi, and others, cause serious plant diseases, but fungi are responsible for the majority of global losses [7].
FUNGI are microorganisms that exist in the planet and are responsible for nearly 75% of all diseases. They are threadlike organisms known as HYPHAE. Anthracnose, Ascochyta blight, Black root rot, Boll rot, Charcoal rot, Leaf spot, Escobilla, Fusarium wilt, Lint contamination Rust, Phymatotrichum root rot, Powdery mildew, Sclerotium stem and root rot or southern blight, Stem canker, Seedling disease complex, and Verticillium wilt are some of the diseases that can affect your plants.
Observations by experts with the naked eye are the most available method for detecting and identifying diseases in practice. However, this requires the constant supervision of professionals, which may be unaffordable on large farms. Furthermore, in certain poor nations, farmers must travel considerable distances to reach specialists, making consultants prohibitively expensive and time-consuming. www.ijacsa.thesai.org The automatic diagnosis of plant diseases is a crucial area of research. It could be useful for large-scale surveillance cultures, allowing for the automatic detection of symptoms of the disease as soon as they emerge on the plant's leaves. As a result, seeking a rapid, automatic, less expensive, and more accurate method will be a distinct advantage for the quality and development of agricultural products and derivatives, particularly when concentrating on a certain plant and disease category. This paper aims to develop a model that can detect Fungal Diseases for cotton crops only based on quality in image (of cotton leaf) processing.
In summary, the paper's contributions can be summarized as follows: 1) Fungal diseases Image Acquisition was done using digital camera or any other device and it was loaded and saved using Matrox Imaging Library (MIL) software and semistructured interviews and questionnaires were also administered to gather the requirements of the model.
2) A classification model of fungal diseases detection was formulated based on the watershed segmentation technique using K-Means Clustering and Edge Detection algorithm and Support Vector Machine (SVM) was used for classification.
3) Simulation of the classification model of fungal diseases detection using Matlab's Image Processing Toolbox 9.4.

4)
Assessing the performance of the model using accuracy, efficiency, and specificity as parameters.
This paper is organized as follows. Section II illustrates relevant literature on image processing based on plant diseases and specifically fungal diseases on cotton cultivation. Section III discusses the model design approach and methodologies adopted in the course of the research. Section IV discusses an overview of the results of the simulation of the model designed and analysis of the results emanating from the simulation. Section V recaps the research work and conclusion and future work is suggested.

II. RELATED WORKS
There are numerous literature studies on plant disease detection and image processing technologies. According to Anju [8], the watershed transformation was a good tool for image segmentation in mathematical morphology. Markercontrolled segmentation was used for watershed transformation-based segmentation. A powerful tool for image segmentation, the Prewitt Edge Detection Operator was used to demonstrate another image segmentation technique that included image enhancing and noise removal strategies. The new method was evaluated and compared to the current one. However, this work has to be improved, particularly in terms of evaluating the method's performance.
Rani and Mahip [9] published an article on the use of machine learning to detect various plant diseases using various image processing techniques. Today, the technical processing of images becomes an essential technique for diagnosing different plant diseases during cultivation. Any part of the gold harvesting area can be affected by the disease. This paper focused on detecting various diseases of cotton cultivation and classifying them. There is as much technical classification as k-Nearest Neighbor Classifier, genetic algorithm, k-means classifier, probabilistic Neural Network, , vector support artificial analysis of the machine and main components neural network, fuzzy logic. Choosing a classification was a tedious task because the quality of the result may be different for various input data. This document provided an overview of various technical classifications used to classify plant leaf diseases.
Thikjarathi and Abirami [10] proposed an Application of Image Processing in Diagnosing Guava Leaf Diseases. In this study, scaled leaf images with improved contrasts are subjected to region expanding segmentation, color transformation (YCbCr, CIELAB), and Scale Invariant Feature Transform (SIFT). Support Vector Machines (SVM) and K-nearest Neighbor (k-NN) classifiers were investigated for disease-wise classification accuracy. While both SVM and k-NN perform well, the former has a little accuracy advantage. However, in the future, this methodology for evaluating classifiers for large datasets can be improved.
Zhang et al. [11] used a natural situation to test an artificial image segmentation algorithm for damaged cotton leaves. The authors employed a segmented monotone decreasing edge composite function, energy function guidance information, the Heaviside function, and the t penalty function to create their model ϕ(x). The results indicated that a model of a cotton leaf edge profile curve could be constructed from a model of a cotton leaf coated in bare soil, straw mulching, and plastic film mulching, and that even with non-uniformity of light, the optimal edge of the ROI could be achieved. In a complex background, the model can classify cotton leaves with uneven illumination, shadow, and weed backdrop, and it is preferable to realize the perfect extraction of the blade's edge. However, this task is restricted by the length of time it takes to complete the Segmentation (problem of efficiency). Pre-treatment, feature extraction, and picture categorization are among the other image processing processes that are bypassed.
"Detection and Identification of Rice Leaf Diseases Using Multiclass SVM and Particle Swarm Optimization Technique," by [12], for example, "By combining K-means, SVM, and multiple classifications, the author presented a new approach for detecting and identifying rice leaf illnesses." The gray level co-occurrence matrix (GLCM) was utilized to extract features. An SVM classifier was used to categorize the diseases, and Particle Swarm Optimization was used to improve detection accuracy (PSO). According to testing results, the proposed methodology was 97.91 percent accurate in terms of disease identification. Furthermore, the Feed Forward (FFNN) and SVM neuron networks accounted for 77.96 percent, 85.64 percent, and 90.56 percent, respectively, of the closest neighbour (KNN) networks.
Kumari et al. [13] suggested an automatic disease detection system for three fungal diseases in cotton crops: Alternatia leaf spot fungal disease (ALSFD), Rust Folar Fungal Disease (RFFD), and Grey Mildew Cotton Disease (GMCD). For disease segmentation on cotton leaves, the kmeans clustering approach was utilized, and the results were www.ijacsa.thesai.org transferred for disease categorization to Artificial Neural Network (ANN) and Support Vector Machine (SVM) classifiers. On a Cotton plant leaf, Khalmar and Khan [14] worked on automatic early leaf spot disease segmentation. Image processing and machine learning are used in this project. The image is captured with a digital camera, and the contaminated region is segmented using a k-means clustering technique after pre-processing. The study only discovered bacterial leaf infections and did not categorize the leaves of cotton plants.
Carderia et al. [15] proposed that Deep Learning Techniques be used to identify Cotton leaf defects. This study proposes a deep learning-based method for cotton leaf screening in order to monitor the health of the leaves. In this work segmentation was not performed on the leaves, features of the leaves were identified and used for classification.
Statement of Problem Existing techniques used in detecting diseases in cotton cultivation suffer several setbacks such as: a confusion in detection of two different diseases but having the same apparent symptoms, stages of disease detection are often incomplete and lack performance. As such, this leads to improper management of the cotton crops and causes low productivity. Therefore, there is a need for a classification model for diseases detection that will focus on specific class of the diseases (fungal diseases) to improve cotton cultivation; To fill this gap, the proposed study will providee a model that can accurately and automatically provide the right detection for fungal diseases in cotton cultivation. The outcome of this research will ensure the quality of cotton and provide healthy cotton to the consumers.

III. PROPOSED MODEL
This section describes the developed model. This proposed model architecture depicted in Fig. 1 is an improvement of Zhang et al. [11] automation of Segmentation of infected cotton leaves under a natural environment.

A. Steps for Pre-processing
The preprocessing steps are Input picture; Subtraction of background; Converting RGB image to grayscale and HSV image; Converting grayscale image to binary image; Filtering. Fig. 2 shows the different stages of image preprocessing. www.ijacsa.thesai.org

B. Development of a Formula for RGB to Grayscale
The formula of RGB to grayscale is generally expressed with a small variation of the coefficients of R, G, and B according to many papers including: according to Padmarathi and Thangadurai [16] ii. I=0.299R+0.5870G+0.1140B (2) according to Saravanan [17] iii. I= 0.3R+0.59G+0.11B according to Kanan and Cottrell [18] iv. I=0.21R+0.71G+0.07B (4) according to Samuel et al. [19] From i, ii, iii and iv luminance, Where (a) Red has contributed r%, Green has contributed g% and Blue has contributed b%;

C. Binary Image
A Binary Picture is a numeric picture. It is also considered as bi-level or two level because each pixel of this image can require a single bit 0 or 1. Generally, 0 shows the black color, and 1 shows the white color, which has two assigned pixel values. In the numeric treatment of a picture, a binary picture is used like masks or as the result of some frequent operations such as Segmentation, thresholding, and dithering, etc. Fig. 3 illustrates an example of transforming an RGB image to a grayscale image and a binary image. Fig. 4 also shows another example of transforming an RBG image to grayscale, binary and HSV image. { Where r, g, and b stands for red, green and blue normalized in value [0, 1] [20].

E. Image Filtering
In image pre-processing, filtering is an important step to use to reduce image noise and improve the visual quality of an image. Basically, image filtering is useful in various applications such as smoothing, sharpening, noise suppression, and edge detection. There are several types of image filtering techniques, such as Laplacian filtering, lowpass filtering, high-pass filtering, and so on. However, in this study, the median filtering technique was used. The image filtering technique is used in an image pre-processing step to reduce noise to improve the result of subsequent processing such as an edge detection example of an image. The median filter is called a nonlinear digital filtering process in image processing. The median filtering technique is used in the image to suppress the noise of the image. Fig. 5 shows an example of obtaining a filtered image from an original image. K-means Clustering: The K-mean is frequently used to define the natural grouping of pixels in a photograph. It is a simple and quick method that attracts a significant number of consumers. For the purpose of creating vector spaces, a clustering approach is useful. The objects are grouped together at the centroids. With luminosity layer 'L*', chromaticity-layer 'a*' indicating where the color falls along the red-green axis, and chromaticity-layer 'b*' representing where the color falls along the blue-yellow axis, K-Means clustering uses the L*a*b color format. The 'a*' and 'b*' layers contain all color information. The generalized pseudocode of the traditional kmeans algorithm and the traditional k-means method according to Oyelade et al. [21] are shown in Algoriyhm1 and Algorithm 2, respectively. Algorithm 1: Generalized pseudocode of the traditional k-means 1. Accept the number of clusters to group data into and the clusters as input values. 2. Initialize the first k clusters 2.1 Take the first k instances or 2.2 Take Random Sampling of k elements 3. Calculate the arithmetic mean of each cluster 4. K-means assigns each record in the dataset to only one of the initial clusters. 4.1 Each record is assigned to the nearest cluster using a measure of Distance (e.g Euclidean distance).
5. K-means re-assigns each record in the dataset to the most similar cluster and re-calculates the arithmetic mean of all the clusters in the dataset.

F. Edge Detection
The identification of edge is employed in image analysis to determine the region's boundaries. In human vision, and probably in many other biological vision systems as well, edges and contours play a significant role. It is not just the sides that catch the eye; a few essential lines can frequently be used to describe or reconstruct an entire figure. Edges are considerable local changes of intensity in a picture. The reasons for intensity include events of geometry (orientation of surface (boundary) discontinuities, discontinuities of depth, discontinuities of color and texture) and non-geometric events (changes of illumination, specularities, inter-reflections of shadows).
There are many techniques for the recognition of edges on a picture. In this study, an appropriate estimate of the first derivative determined by two operators of Sobel will be used. As the derivatives heighten noise; the smoothing aspect is an interesting feature of operators of Sobel. Firstly, the derivatives are executed with the use of gradient amplitude.
For a function f (x, y), the gradient f at the coordinate (x, y) is determined as the two-dimensional column vector. Fig. 6 shows the edge detection with the use of Sobel operators in segmentation. www.ijacsa.thesai.org Where is the gradient along the x direction is the gradient along the y direction Sandiya and Patial [22].

G. Image Segmentation
The process of segmenting image is breaking down a numeric image to form many segments. Segmentation is a technique for giving meaning to a picture or making an analysis more intelligible. The Segmentation of the picture is used to localize the limits and objects as curves, lines, etc. Literally, in image segmentation operation, all pixels in a picture have been assigned a label where the same label allots the same visual features. The outcomes of image segmentation are multiple segments that a set of contours extracted from the picture or collectively cover the whole image. In a region, each pixel is a computed property or similar with respect to some feature, such as color, texture, or intensity.

H. Morphological Watershed Segmentded Images
The key to using the watershed transform for segmentation is to change your image into one with catchment basins that match to the objects you wish to recognize. Infected and noninfected leaves are shown in Fig. 7 as a Morphological Watershed Segmented Image.

I. Watershed Approach 1) The Algorithm
The possibilities of the algorithm are as follows: assume that a hole is drilled at least in each regional neighborhood and that all geology is submerged from below, allowing the water to climb at a constant rate through the perforations. Flooded pixels are those that are under the water level at any given time. As the water level increases, the size of the flooded areas increases. Water will reach a level where two sites immersed in separate catchment basins will mix in the long term. When this occurs, the algorithm constructs a onepixel thick barrier between the two sections. This overflow continues till the entire image is divided into numerous different basins separated by watershed edge lines or ridgelines.

J. Feature Extraction: Sift-Scale Invariant Feature
Transform Any object can have various features and important details that can be accessed to produce a description. This layout is used to identify an element in an image that contains a lot of other things. The SIFT method converts a picture into a large set of local feature vectors in order to generate image characteristics. The image's size, translation, or rotation have no effect on any of these feature vectors. The SIFT algorithm uses a four-stage filtering approach to help in the extraction of these properties:

K. Development of Formula for Difference of Gaussians
Calculation D(x, y, σ) Where: The difference of Gaussians calculation is D(x, y, σ). * Denotes the convolution operator. I(x, y) is the input picture. Then the scale space is calculated by the function: L(x, y, σ) = I(x, y) * G(x, y, σ) with G(x, y, σ)= ( ) ⁄

L. Support Vector Machine (SVM) Classification
SVM is a technique for data analysis and recognition. It is used to do things like regression analysis and categorization. The SVM algorithm takes a set of input data and predicts classes for each individual input. SVM is a linear model-based trainer that uses extracted features to build a hyper plane that translates a piece of data into the separation between the hyperplane and the nearest training points.

a) Contrast
The intensity difference between a pixel as well as its adjacent for the entire image is returned by this function.
According to the results of Table I, the normal leaves of cotton have the following features: the contrast of different leaves is between 0 to 0.09; homogeneity is between 0.98 to 1.0; energy is between 0.98 to1.0. According to the results of Table II, the disease leaves of cotton have the following features: the contrast of different leaves is between 0.1400 to 0.3; homogeneity is between 0.2000 to 0.97; energy is between 0.3 to 0.97. The above result analysis concludes that diseased leaf image contrast is more than the normal leaf. On the other hand, diseased leaf image homogeneity and energy are less than the normal leaf. Hence, we conclude that images are defective.

A. Classification Results and Somecomparisons with the Existing Model
As the first manifestations and symptoms of the fungal disease appear on the leaves of the plant, detection techniques and classification are developed. The characteristics of color, texture, and form are widely used in detecting and classifying plant infections. Following segmentation, the texture, pigment, and form properties of the diseased areas are removed or separated and used as input to the SVM classifier.
According to Table III relating to the achievement of the preprocessing and image processing steps, only the image segmentation step is performed with the existing model while the preprocessing, segmentation, extraction, and classification of images are done with the developed model.

B. Distribution of Images
A set of 500 images of infected cotton leaves was used to validate the detection approach of the disease. In this sample of 500 images, 300 (60%) were used for the training of the system and 200(40%) for the tests. Table V illustrates the distribution of images.

C. Evaluation of the Model for Fungal Diseases Detection in Cotton Cultivation
The evaluation of the proposed approach was achieved based on precision and recall, and the images are grouped in accordance with the generated results. Based on the system's results, these images were categorized as TP (rule matched and disease present), FP (rule matched and no disease present), TN (no rule matched and no disease present), and FN (no rule matched and disease present) based on results generated by the system.

1) TP (true positive):
The infected images are predicted to have a fungal disease.
2) TN (true negative): Images not infected by fungal disease and are predicted to be negative of infection.
3) FP (false positives): No fungal diseases present in the image but predicted to be present by the system.

4) FN (false negatives):
Presence of fungal disease in the image but the system predicted its absence.

5) GT (ground truth):
It displays the number of comparisons made during the testing process.
6) TC (total cases): The total number of comparisons made during the testing process.

7) RM (result of method):
This shows the overall number of false-positive and true positive system predictions.

8) Accuracy:
The performance of the system as a percentage of correct prediction. 9) Misclassification: The percentage of times the system has predicted incorrectly.
10) Recall: Represents the percentage of affirmative cases discovered by the system represented.
11) Precision: It shows the percentage of the positive predictions made by the system.
These results are presented in Tables VI and VII.   The evaluation results in Table VIII showed that the proposed technique gives 99.05% accuracy with a 0.95% misclassification rate. Furthermore, the recall rate is 99.5%, Specificity is 90% and precision is 99.5%. The system's efficiency is justified by reducing processing time and expertise costs. For example, under a 3.0 GHz Pentium IV PC with 1 GB of integrated RAM, the average processing time between preprocessing the image and the classification of the improved method was 49.6 seconds. The time required to obtain a result has been considerably reduced. This allows farmers to save time (a few seconds for a result) and resources (the cost of the expertise is considerably reduced, and it is not necessary to make an appointment with an expert in the hope of defining the quality of cotton).  In this research, cotton plant disease detection of infected leaves and classification of infected and uninfected leaves is done using image-based processing and machine learning approaches to assist farmers during their struggle against disease outbreaks by making the right decision to increase productivity and collect pure cotton. The results proved that the proposed model has the ability to accurately distinguish between infected and uninfected cotton leaves. However, the model is limited to only detecting fungal diseases in cotton leaves. For future research, this work could be extended to consider other categories of cotton plant diseases such as bacterial, seedling, and boll rots and also implement the system as a mobile and web application to eradicate the manual identification of plant defects, which has become a long and costly process.
In general, the use of image processing to detect diseases and classification does not apply to a specific area. For sustainable agriculture, this work can be applied to identify diseases or the quality of vegetables with great accuracy. This work will contribute significantly to the development of agricultural research.