Birds Identification System using Deep Learning

Identifying birds is one of challenging role for bird watchers due to the similarity of the birds’ forms/image background and the lack of experience for watchers. So, it needs a computer system based images to help birdwatchers in order to identify birds. This study aims at investigating the use of deep learning for birds’ identification using convolutional neural network for extracting features from images. The investigation was performed on database contained 4340 images that collected by the paper author from Jordan. The Principal Component Analysis (was applied on layer 6 and 7, as well as on the statistical operations of merging the two layers like: average, minimum, maximum and combine of both layers. The datasets were investigated by the following classifiers: Artificial neural networks, K-Nearest Neighbor, Random Forest, Naïve Bayes and Decision Tree. Whereas, the metrics used in each classifier are: accuracy, precision, recall, and F-Measure. The results of investigation include and not limited to the following, the PCA used on the deep features does not only reduce the dimensionality, and therefore, the training/testing time is reduced significantly, but also allows for increasing the identification accuracy, particularly when using the Artificial Neural Networks classifier. Based on the results of classifiers; Artificial neural networks showed high classification accuracy (70.9908), precision (0.718), recall (0.71) and F-Measure (0.708) compared to other classifiers. Keywords—Birds identification; deep learning convolutional neural networks (CNN); VGG-19; principal component analysis (PCA)


I. INTRODUCTION
Many people are interested in observing and studying wildlife, especially in birdwatching. The role of birdwatching is to preserve the nature by observing bird's behavior and migration pattern. The challenge for bird watchers in identifying birds based images remains difficult due to the similarity of the birds' forms/ image background and the lack of experience in this field for watchers [1].
As mentioned in [17] that birds Voice or Videos were used in earlier technique to predict it species, but this technique have many challenges to give an accurate result due to other background of birds/animal voices. So, images can be best choice to be used to identify birds' species. To implement this technique, the images for all birds' species need to be trained to generate a model. Then deep learning algorithm will convert uploaded image into gray scale format and apply that image on train model to predict best match species name for the uploaded image.
Also, during the previous years, artificial intelligence is used in the field of bird watching based images using different algorithms and methods [1][3] [4][7] [14], but this study differs from others in using the following operations: combine between the fc6/fc7, max between fc6/fc7, min between fc6/fc7, and the average for fc6/fc7 based on VGG-19. Hence, the field of birdwatching needs more investigations to develop systems with new technique that help to identify birds.
As the database of images were collected from Jordan, and the statistics number of birds in Jordan as stated in [13] are 434 species belonging to 66 families.
This study aims at investigating the use of deep learning for birds' identification using VGG-19 for extracting features from images. In order to achieve this aim, the investigation for the performance of different classifiers were performed on the following classifiers: (KNN, Decision Tree, Random Forest, and ANN) on the collected reliable database of birds images that available in Jordan.
VGG-19 considered as one of the most important models of Convolutional Neural Networks (CNN). Therefore, CNN is considered as the strongest technique for deep learning used in image identification [9].
The main reason of using VGG-19 is to provide high precision by finding features with distinctive details in the image like the difference in lighting conditions and other objects surrounding the birds [3]. Moreover, PCA could be employed as dimensionality reduction tools with these features that would help to reduce number of features that will make the training time less.
The motivation to conduct this study represented by: 1) The shortage in the field of identifying birds based on images. 2) To the best of our knowledge, we have not come across to any study conducted using VGG-19 for identifying birds. 3) There is shortage in database available in the world except these two databases that available in [1] [18]. This case is applicable to Jordan, as there is no database of images for birds, and there is no program was developed to identify birds.
Based on the extracted features using VGG-19, the contribution of this study can provide a research fields with a comparison between the results of different aforementioned classifiers.
This study organized into six sections. Section II introduces the overview of previous studies on all related subjects. Section III describes the used database. Section IV discusses the model design and the methodology for the experiment. Then Section V discusses the results of the experimental, and finally, Section VI presents paper conclusion.

II. RELATED WORK
Machine learning (ML) represents a set of techniques that allow systems to discover the required representations to features detection or classification from the raw data. The performance of works in the classification system depends on the quality of the features. As such of this study can be categorize under the field of ML; this is to make a search in this area for the studies that belong to birds' identification.
In the literature review, there are number of studies conducted in field of identifying birds. But they were conducted in different algorithms and methods, as follows: There are number of studies conducted for identifying birds based audio/ video like [4] [11][6] [10]. While other studies conducted to identify birds based images using AI algorithms [1][3] [14], but not in what was conducted in this study. This study used different operations like: MAX, MIN, AVERAGE, and Combine between the layers fc6/fc7 based on VGG-19 algorithm.
In field of birds database-based images and birds identification system, the researchers in [19] conducted study on data collected mostly from North American of 200 bird species, where they called it: (Caltech-UCSD Birds 200 (CUB-200)). They conducted their study based on two simple features: image sizes and color histograms. In the case of image sizes, they represented each image by its width and height in pixels. But in case the color histograms, they used 10 bins per channel, where an applied Principal Component Analysis was applied. Their results showed how the performance of the NN classifier degrades as the number of classes in the dataset is increased, as in [18]. The performance of the image size features are close to chance at 0.6% for the 200 classes, while the color histogram features increase the performance to 1.7%. Another example of studies that conducted in field of database for birds based images and birds' identification system, the researchers in [18] increased the number of images to 11788 images; as it was 6033 in [19]. Where they used RGB color histograms and histograms of vector-quantized SIFT descriptors with a linear SVM. The results obtained of their study for the classification accuracy is 17.3%.
Also, in the field of birds' identification system, the researchers in [14] proposed a new feature to distinguish the types of birds. In their study, they used the ratio of the distance from the eye to the beak root, and the beak width. This feature was integrated in the decision tree, and then in SVM. This proposal was applied to the database that called (CUB-200-2011 dataset) that mentioned in [18]. The results achieved for correct classification rate is 84%.
Another study conducted on birds-identification. Their database was collected in India by the researchers that available in [1]. In their study, their database consisted of 300-400 different images consists of number of bird species. In their study, the algorithm used to extract image features is AlexNet and then classified by using a SVM classifier. The results of accuracy is 85%.
The researchers in [11] used multiple pre-CNN networks algorithms like: (AlexNet, VGG19 and GoogleNet) on birds dataset that is called (Caltech-UCSD Birds-200-2011). Based on approach of combining between the aforementioned algorithms together, the results showed that this approach improved the accuracy that reached to 81.91%, when applied on Caltech-UCSD Birds-200-2011 dataset compared to other datasets used in the same study.
Another study conducted by [4] in field of database birds based images and birds identification system. Their study aimed to classification the birds during flight from video clips. They approximately collected 952 clips and extracted about 16,1907 frame photos of 13 birds' species. In order to improve the accuracy, the researchers used the two features: appearance and motion features. Then, they compared their proposed method with the classifiers (VGG, MobileNet). The proposed method achieved a 90% correct classification rate when using Random forest classifier.
In field of birds' identification system, the researchers in [3] applied different methods like: 1) softmax regression using manually features on the Caltech-UCSD-Birds-200 dataset [19]. 2) A multi-class SVM was applied on HOG and RGB on features extracted from images. 3) A CNN was applied using transfer learning algorithm to classify birds. The results of comparing the three methods 46% when using CNN.
In the next section, the database content, number of images, source of images, and the challenges to classify images are explained.

III. DATABASE DESIGN
The database of birds images were collected from Jordan, and it consists of 4340 images of 434 bird species. The database images were obtained from scientific sources and were approved by Jordanian Bird Watching Association based on their scientific names [13].
The images have different backgrounds, where some of them were taken in shadow condition, lightening background, and some of them have other objects in the images as background. This has added a huge challenge to the researchers to extract features, and to provide high accuracy.

IV. PROPOSED METHOD
This section presents the procedures that used for the proposed method in identifying birds using VGG-19. Fig. 1 shows the proposed model. The following steps explains the proposed model of this study, as follows: Step 1): The feature vectors will be extracted form images automatically using MATLAB for Pretrained VGG-19 to build dataset that includes (feature factors: fc6 and fc7). Each dataset (e.g. fc6) contains 4096 columns (representing feature vectors) and 4340 rows (representing the number of samples (images).
Step 2): The statistical operations like: (min, max, average, and combined them together) were performed on the original/pure of fc6 and fc7 layers, this is to obtain new dataset to be used in the next stage (step 3) of using classifiers. Explanation on statistical operations, as follows: • Max: used to find the largest value between the two values in fc6 and fc7 and put value in a new group.
• Min: used to find the less value between the two values in fc6 and fc7 and put value in a new group.
• Average: used to find average the two values in fc6 and fc7 and put value in a new group.
• Combined them together: used to combine the first group (4096) next to the second group (4096). This is to have a new group that contains 8192 features in this study.
Step 3): A PCA will be applied on the original/pure of fc6, fc7, the dataset that obtained from the previous stage (step 2); this is to produce a new datasets.
The data obtained using the pre-trained VGG-19, is very large (4096), therefore, the PCA was implemented to reduce the number of features. In PCA, there were set of percentages used to show the variance of the data in the results, which are: 95%, 97% and 99% variance of the data (the 4096 features).
Step 4): The results were performed based on applying set of classifiers on the datasets that obtained from (step 2 and step 3).

V. EXPERIMENTAL RESULTS AND DISCUSSIONS
This section presents the performance evaluation results for the study dataset, which includ the accuracy , F-measure, recall, precision and training time for each classifier as follows:-1KNN, 3KNN, 5KNN, ANN, Naïve Bayes, Random Forest and Decision Tree.
The results of this study are displayed as follows: Table I shows the results of both orginal of fc6 and fc7 datasets. Naive Bayes has achieved the highest accuracy results for fc6 and fc7 which are (59.002) and (56.106).. While for the time spend to conduct the test and training dataset, Decision Tree has spend large time (1406.69s), but KNNs spend less time (0s) compared to other classifers. This is because it has no training model; where the test example is compared directly to other examples in the training set, and that why it is slow in testing, particularly when used a large number of examples in the training [8] [16]. This results match with the results in [5] [12].

B. Results of the Statitsical Operations on fc6 and fc7 Datasets
The section show the results of three dataset by applying statistical operations(avgerage, maximum, minmum) between the fc6 and fc7 layers.  [15].

C. Results of Combine between (Original fc6/ fc7) Dataset
A new dataset was obtained called combine by combining of fc6 (4096)and fc7 (4096), which contained 8192 feature vector, and accordingly will obtained the results: Tables IV to V shows the idntification results for each classifier after applying PCA (95%,97%,and 99%).
In  Applying PCA has influnced on the training time for fc6 that made it less for all classifers in Table IV-after applying  PCA compared to the training times in Tables I to III, before applying PCA, especially for Random Forest and Naïve Bayes. The highest accuracy resultant from applying PCA of (95%, 97% and 99%) is in favor of ANN with (68.8018, 70 and 62.3733%), respectively, which can be attributed to the reduced feature vector.
So, it is worth mentioning that the ANN classifier was not used with other sets except those obtained after applying the PCA, this is because of its unacceptable training time. This results matches with previous studies that stated the training time for ANN spend large compared with other classifers [2] [15]. The second high accuracy resultant from applying PCA of all percentage of (95%, 97% and 99%) is Naïve Bayes, has achieved accuracy of (58.3641, 56.9585 and 56.3825%), respectively.

E. Results of the Statistical Operations on (fc6 and fc7) after
Applying PCA This section presents the identification results of the statistical operations on each of (average, maximum and minimum) between the fc6 and fc7 after applying PCA  (fc6 and fc7)) where the highest accuracy resultant from applying PCA of (95%) are in favors of ANN with (66.9816) . It is noted that the results of the ANN is appeared only for PCA (95%), but not for the percentage of (97%, and 99%). This is because the large number of features for each of PCA (97% and 99%) that reached to (1428, and 2117) features, respectively. Therefore there will not be results when using ANN, due to its unacceptable training time (that takes days to provide the results. While for the time spend to conduct the test and training dataset, ANN spend large time 54151.88s. Table VIII shows the birds identification results in (minimum between (fc6 and fc7)) where the highest accuracy resultant from applying PCA of (95%) are in favors of ANN with (70.8295). It is noted that the result of the ANN is appeared only for the PCA (95%), but not for the percentage (97%, and 99%). This is because the large number of features for each of PCA (97% and 99%) that reached to (1205 and 1910) features respectively. Also, due to its unacceptable training time (that takes days to provide the results. While Naïve Bayes achieved accuracy resultant from applying PCA of all percentage of (95%, 97% and 99%), they are as follows (48.7327, 44.1014 and 35%), respectively. While for the time spend to conduct the test and training dataset, ANN spend large time 42677.02s.

F. Results of Combining Feature Vector after Applying PCA
This section shows the results of combining between fc6 (4096) and fc7 (4096) that reached 8192, but this number of features have been reduced after appling PCA (95%, 97%, 99%) that become (250, 440 and 1080) features, respectively. The results of combine, as follows:   Table IX shows the birds identification results in (combine between fc6 and fc7) where the highest accuracy resultant from applying PCA of (95%,97% and 99%) are in favors of ANN with (69.5392, 70.9908 and 67.9263), respectively. The second high accuracy resultant from applying PCA of all percentage of (95%, 97% and 99%) is Naïve Bayes that has achieved accuracy of (57.235, 54.1475 and 43.7558%), respectively.
While for the time spend to conduct the test and training dataset, ANN spend large time (56279.29s). Comparison between the proposal work and previous researchers' works. Table X compares the results of the proposed approach with three similar approaches for birds identification.
Table X has approved that the output of our proposal can be considered as one of the interesting study comapred to the previous researchs, for several reasons: 1) Some of previous studies were conducted on small dataset birds (categories) like in [4], [7] that used (13), (16) categories recpectively, compared to this study that used (434).
2) Some others of previous studies conducted on dataset containing a large number of images in training dataset (examples) like in [4], [3], [14] that used (161907) This leads to make more covident in the results of this study.
3) There were studies conducted for identifying birds using different algorithms and methods based audio/ video like [4] [11][6] [10], while other studies conducted to identify birds based images using AI algorithms [1][3] [17]. This is less in what was conducted in this study that used deep-learning algorithms and different statistical operations like: MAX, MIN, AVERAGE, and combine between the layers fc6/fc7 based on VGG-19 algorithm.

VI. CONCLUSION
This study aims at investigating the use of deep learning for birds' identification system using VGG-19 for extracting features from images. VGG-19 is one of the pre-trained convolutional neural network (CNN) networks that used for image identification which was used in this paper to extract the features from birds' images.
Database of this study is contained 4340 images of 434 bird species obtained from scientific sources and where approval by Jordanian Bird Watching Association based on scientific name.
In this study, the two layers in the structure of VGG19 to get the features were used layer 6 (called fc6) and layer 7 (called fc7); each layer consists of 4096 features.
Since the size of the deep feature vector obtained from the VGG19's layers (6 or 7) is very large (4096), we opt for Principal Component Analysis (PCA) and to do the dimensionality reduction. Moreover, it was created more feature vectors called statistical operations to generate more datasets from (fc6 and fc7) using average, minimum, maximum and combine of both layers.
The created datasets (i.e. with PCA and without PCA), as well as the datasets that created from statistical operations are used as input for classification using various machine learning classifiers including Artificial neural networks (ANN), K-Nearest Neighbor (KNN), Random Forest, Naïve Bayes and Decision Tree.
The results of investigation in this study include and not limited to the following, the PCA used on the deep features does not only reduce the dimensionality, and therefore, the training/testing time is reduced significantly, but also allows for increasing the identification accuracy, particularly when using the ANN classifier. Based on the results of classifiers; ANN showed high classification accuracy (70.9908), precision (0.718), recall (0.71) and F-Measure (0.708) compared to other classifiers.
It is recommended to conduct more investigation to improve accuracy results and to reduce training time using different algorthms.