Artificial Intelligent Techniques for Palm Date Varieties Classification

The demand on high quality palm dates is increasing due to its energy value and nutrient content, which are of great importance in human diet. To meet consumer and market standards with large-scale production, in Oman as among the top date producer, an inline classification system is of great importance. This paper addresses the potentiality of using Machine-Learning (ML) techniques in classifying automatically, without any physical measurement, the six most popular date fruit varieties in Oman. The effect of color, shape, size, and texture features and the critical parameters of the classifiers on the classification efficiency has been endeavored. Three different ML techniques have been used for automatic classification and qualitative comparison: (i) Artificial Neural Networks (ANN), (ii) Support Vector Machine (SVM), and (iii) K-Nearest Neighbor (KNN). Based on the merge of color, shape and size features contributes to achieve the highest accuracy. Experimental results show that the ANN classifier outperforms both SVM and KNN with the highest classification accuracy of 99.2%. This developed vision system in this paper can be successfully integrated in the packaging date factories. Keywords—Palm date; feature extraction; machine learning; computer vision


I. INTRODUCTION
The great energy and the content of nutrients present in the date fruits are the reasons behind the great importance of them in human diet. Referring to Food Agricultural Organization (FAO), the largest date producer countries in the world are located in the Middle East and North Africa. Oman produces about 260-270 Million-tons and ranks among the largest dates producing countries in the world [1][2]. However, only around 7000 tons of date fruits is reported under export [3]. This low output could be related to the required higher international export standards such as color, size, softness, freshness, etc. In order to diversify the sources of income, Oman has considered palm dates as a priority to its economy. It encouraged to plant palm dates that reached more than 250 varieties. Texture, size, shape and color are the main used features to differentiate between varieties [4][5][6]. Khalas, Fardh, Khunaizi, Qash, Naghal, and Maan are known as the six most popular date fruit varieties in Oman. The sweetest variety is Khunaizi and the delicious variety is Khalas [2]. In date's industries, the classification of dates into diverse classes is an essential task. Using intelligent computerized systems this classification task can be automated to produce an accurate and fast classification of date's varieties compared to traditional way. Therefore, the related industries are improving their products in quality and quantity [7,8].
This paper aims to propose a computerized vision system that automatically classifies date's varieties based on image processing techniques combined with Artificial Intelligence algorithms. Traditionally, palm date`s classification is performed based on grade [9]. Starting from 2012, computer vision and pattern recognition have been introduced for automatic date`s fruits classification. The authors have tested seven categories of dates and fifteen features have been extracted. To compare the results, they used multiple classifiers such as Neural Networks (NN), Linear Discriminant Analysis (LDA) and Nearest Neighbor [10]. A sorting system based on ANN was presented in the context of date fruits in 2012. Two neural networks models have been used. The first model is using a multi-layer perceptron (MLP), the second model is using Radial Basis Function (RBF) networks with a backpropagation learning algorithm. The performance accuracy of 87.5% and 91.1% have been achieved using MLP and RBF, respectively [11]. An automatic system for classification, which uses different images of dates, is used to classify different types of dates [12,13]. In this work, different features are extracted from the images of the dates such as the shape, texture, and the color. Fisher discrimination Ratio (FDR) has been used to reduce the dimension of features vector where SVM was used as a classifier [12,13]. For date's classification relying on hardness, a system equipped with a monochrome camera was presented in 2016. This study used histogram and texture features in their system and LDA and ANN were implemented as classifiers [14][15][16]. In 2018, an automated system that identifies different date fruit maturity status and classifies their categories is developed. Color, size and skin texture features are extracted. The system counts the number of dates, classifies them into different classes and identifies the defects [17].
Our aim is to classify automatically, without any physical measurement, the top six date palm varieties in Oman. We will work to extract color, shape, size, and texture features of various date images. Artificial Neural Network (ANN), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) classifiers are proposed and comparative performance analysis are conducted. This paper is organized as follows: Section II describes the materials and methods. Experimental results and discussions are given in Section III. Section IV concludes the paper. The proposed and developed system flowchart is given in Fig. 1. First, a dataset of colored images of dates must be achieved, where each image contains only a single date. To prepare the images free of noisy segmented pixels, different operations such as segmentation and mathematical morphological are then applied. Then different features are extracted from these segments. In the training phase, to address the importance of each feature, different classifiers are trained. Then the classifiers are trained again with a combination of two or more features together. Based on the achieved results in each training process. The most effective features have been determined and used to update the learning parameters of the classifiers. In the testing phase, new and unseen data are presented to the classifiers for testing.

A. Samples Collection
The most valuable and common dates in Oman such as Khalas, Fardh, Khunaizi, Qash, Naghal and Maan have been used in this study as shown in Fig. 2. AL-Dhahirah Governorate is the main source for the collected samples. The dataset comprises six types, 100 samples for each class.

B. Image Acquisition System
RGB color camera, fluorescent lights and, EOS 1100D, Canon, Taiwan, resolution of 4272 × 2848 pixel Personal computer have been used in the developed system [18,19] are. For background, a white paper with A4 size is used. Each date is 15 cm away from the used camera and is manually placed. The mode of self-timer embedded in the camera helps to take images for each sample. To reduce expected noise, more images have been take.

C. Preprocessing and Segmentation
MATLAB Toolbox (Version R2014a, The Mathworks Inc., Natick, MA, USA) helped to develop algorithms to accomplish different operations on the acquired images. The procedure for image processing is illustrated in Fig. 3. For simple and prompt processing, the image samples are resized. After that, grayscale images are obtained by the conversion of the colored images meanwhile another samples of colored images are maintained for more processes. The foreground and background regions of the images are separated from the grayscale images based on Otsu's method [20] and then followed by morphological operations.

D. Extraction of Features
The identification of the effective features is considered as the most challenging process of classification of dates. The most vital features that can be used for date's classification are colors features, size-shape features, and skin texture features as depicted in Fig. 4.

1) Color features:
Since the color feature is the most important feature dates varieties that provide the remarkable information for the classification of dates. At the starting, the channels of Red, Green, and Blue are separated from the cropped RGB images. Then, the mean and the standard deviation were calculated from each channel. However, the minimum, maximum and mean intensities are determined from the gray images. The pixel that has the smallest intensity and the pixel that has the biggest intensity represent the minimum and maximum intensity, respectively. The mean intensity is represented by the mean values of all pixels [10].
2) Shape and size features: In addition to the color feature, the size and shape are important features for the classification of the dates. These features enhance the classification accuracy. Different features of the shape and size can be achieved from the segmented images such as Area, Major axis length, Minor axis length, Ellipse eccentricity, solidity and perimeter (see Fig. 5). Equation 1 is used to calculate the Eccentricity and Equation 2 computes the Solidity [7,10,21].
Eccentricity (e) is computed as, 2 2 3) Texture features: Skin texture can be used to separate some sort of palm dates. Therefore, it is essential to consider dates texture as features. Statistical texture features can be calculated by the Gray Level Co-occurrence Matrix (GLCM) method [22]. GLCM indicates how often the particular gray level pixel pair ( , ) ij with a distance l and relative orientation  has applied in a given matrix and is represented by    Crop of region of interest using the vertical and horizontal coordinates Step 1 Step 2 Step 3 Step 4

Input RGB image Output
The value of the intensity in (m × n) images is determined by the function ( ) Assuming that P ( , ) O is the sum of squared elements in GCLM and is given by equation5.
c) Correlation is the measure of the similarity (S) and is given by equation 6.
Where, N g is the number of distinct gray levels and p( , ) ij represents the th ( , ) ij entry in the GLCM. The means in the raw and the column directions are µ i and µ j , respectively. The standard deviations in the raw and the column directions are σ i and σ j , respectively.
Only one single direction of 0 o   is considered in our work leading to four features.

A. Setup of the Artificial Neural Network
In this work, feed-forward (two layers) neural networks trained with Levenberg-Marquardt backpropagation algorithm. The function of tansig and logsig in used as an activation function in the hidden neurons and the function of softmax is used in the output neurons [22]. Four hundreds and eight images from the collected samples are used for the neural network training, and seventy-two images are used for the network validation while one hundred and twenty images are used for the network testing. Different neural network www.ijacsa.thesai.org architectures in terms of different numbers of neurons per each hidden layer are trained with different numbers of features to achieve the most remarkable architecture and the most valuable features. The training process has been repeated for thirty times and the results have been averaged for the sake of reliability. Fig. 6 and 7 show the plots of the accuracies of the ANN classifier using different features vs. number of hidden neurons for logsig and tansig activation functions respectively. It is clear that as the number of hidden neurons increases from 1 to 3, the increase in the accuracy is very clear. It is noticed that, when the number of used neurons in the range between four and ten, the improvement in the classification accuracy is subtle.
In addition, the classification accuracy based on the four texture features was low compared with the other features. The performance obtained were 36.08% and 53.72% when the function of logsig is used. However, the performance obtained were 35.36% and 54.36% when the function of tansig is used. It concludes that the texture feature cannot contribute thoroughly in the classification process of the dates. While, the color and shape features can participate to a high extent in the classification of the dates. The achieved performance accuracy using the features of the color and shape was in the range of 58.03% to 80.06% and from 62.58% to 79.67% for using both of logsig function and of tansig function, respectively. As shown in Fig. 6 and Fig. 7, the contribution of both color and shape are comparable. The classification accuracy using shape accuracy is in the range of 65.88% to 81.11% for using logsig function and in the range of 63.67% to 81.19% for using tansig function. There was an improvement in the classification process when all features are used together. The accuracy was 96.22% for using of logsig function and 96.21% for using tansig function. However, there was an improvement in the accuracy (97% and 97.26%) when the texture features were excluded and only color and shape features were used. In our research work, a remarkable performance accuracy of 97.26% was achieved, using a hidden layer that includes seven-neurons and the tansig function as used as depicted in Table I. In case of using logsig function and nine neurons in the hidden layer, a higher accuracy was obtained.

B. Setup of the Support Vector Machine
In this approach, the training subset is portioned to 10 parts. Nine portions in each iteration is used for the training process, and one portion only is used in the validation process. The rest of the dataset (20%) are used in the testing process. The optimal value of the kernel scale of Radial Basis Function, RBF, was set automatically but the Super Vector Machine, SVM, optimization parameter "C" changed for five different values in the range [1e-1, 1e-3].
The achieved results that represent the relation between accuracy of different features and the box constraint parameter C is illustrated in Fig. 8. It is clear from the achieved results that the performance of SVM and ANN are similar. Again, the achieved performance in terms of the features of texture were the smallest, in the ranges of 47.06% to 57.14%.
However, the combination of color and shape-size features reaches the highest accuracy of 97.1386% when the box constraint is 10. When the box constraint increases more than 10, the accuracy of different features either decreases or increases with very small amounts. When all features are used, the accuracy was reduced little bit as compared to color and shape-size combination which shows that texture feature can be ignored.

C. Setup of the K-Nearest Neighbor Classifier
The most basic data classification and pattern recognition classifier is the K-Nearest Neighbor, KNN [23]. In this approach, even though selecting the value of the constant K and distance metric is critical, this process does not need tuning many parameters, its efficiency is high. This is the highly recommended advantage of using KNN in object classification. As illustrated in Fig. 9, the lowest performance was achieved using KNN when we compared the obtained results with that of both ANN and SVM. The obtained performance was 53.33% when the texture features were used and K was seven. It was found that the value of K affects the performance in the case of using both color and shape-size features. When k=10, the performance was 70% when the color features were used while the achieved performance was 82.5% when the shapesized features were used. The performance improved when both features were mixed together, color and shape-sized, and the value of K was five only.

D. Classifiers Performance using Confusion Matrix
Confusion matrix has been used as a metric to measure the performance of different classification algorithms. It evaluates the accuracy of the networks classification system for training, validation and testing dataset. The column indicating the desired output represents the target class. The class of the output is in the rows of the matrix indicating the output of the system. The results of ANN confusion matrix for the features of both color and shape-size features when the function tansig and logsig are used in the hidden neurons respectively as shown in Fig. 10 and Fig. 11. Table I summarizes the highest achieved performances (%) of the classifiers (ANN with tansig hidden neurons, SVM, KNN with K=5) using color and shapesize features. We see clearly that when the function of tansig is used the neurons activation function in ANN, the performance of ANN is perfect (recall of 100. When C=10, SVM performance is less than the performance achieved using ANN. It could classify 4 out of 6 classes perfectly (recall of 100%).
At the recall of 90.5% and 94.7%, Khunaizi and Naghal are classified. SVM with a precision of 100% managed to classify Khalas, Qash, Naghal, and Maan. However, with a precision of 91.3% and 95%, respectively Fardh and Khunaizi are classified. The lowest performance is given by KNN. Two classes are only classified perfectly (recall of 100%). At a recall with the range of 90.5% to 95.2%, the rest of the classes are classified.

E. Time Complexity Analysis
As shown in Table II, the testing time (seconds) of different classifiers are presented. It is found that the time used for classification for both ANN and SVM is almost the same. ANN with tan-sigmoid hidden neurons was able to classify the testing samples in about 0.92 seconds/sample, which is considered as the lowest classification time in this paper. Logsig neural network and tansig neural network reach the highest accuracy in very close time. From Table II, we can judge that the times taken by both classifiers (ANN and SVM) are comparable. However, simulation results show that SVM takes much more time to achieve the classification. The KNN classifier achieves the highest classification time of 2.56 seconds /sample (i.e. the slowest algorithm) since it needs to calculate the distance from each testing sample to all the training samples when a classification is required.    Even though there is an increase in the achieved performance when a combination of features is used in the classification process, 19 features, as shown in Fig. 6 to 9, the computational overhead will be increased. Using the irrelevant date's features leads to the expletive of dimensionality and decreases the performance of the classification system as shown in Fig. 6 to 9, where the performance of the system in terms of classification accuracy is decreased after including texture features (19 features). By choosing an appropriate feature dimension (15 features as a combination of color features with shape features), balanced performance is achieved.

IV. CONCLUSION
The potential of CV systems (combination between color image processing and Machine-Learning techniques) in classifying automatically date fruit varieties in Oman has been investigated. Three ML techniques (ANN, SVM, and KNN) have been used and compared to each other in achieving the classification tasks. Intensive experiments and qualitative comparison are conducted among the developed approaches. Based on the combination of both color and shape-sized features give the highest performance accuracy in all approaches. This implies that date fruits have significant differences in colors and shape-size rather than textures. Meanwhile, the former combination represents an optimum solution of maximum accuracy with less number of features as well as better processing time is achieved. The highest classification accuracy obtained by ANN, SVM, and KNN classifiers are 97.2581%, 97.1386%, and 95.83%, respectively. Thus, CV systems can be effectively used to classify date fruits and hence could be successfully used as an automatic date separator in the packaging date factories.