Pre-Trained Convolutional Neural Network for Classification of Tanning Leather Image

Leather craft products, such as belt, gloves, shoes, bag, and wallet are mainly originated from cow, crocodile, lizard, goat, sheep, buffalo, and stingray skin. Before the skins are used as leather craft materials, they go through a tanning process. With the rapid development of leather craft industry, an automation system for leather tanning factories is important to achieve large scale production in order to meet the demand of leather craft materials. The challenges in automatic leather grading system based on type and quality of leather are the skin color and texture after tanning process will have a large variety within the same skin category and have high similarity with the other skin categories. Furthermore, skin from different part of animal body may have different color and texture. Therefore, a leather classification method on tanning leather image is proposed. The method uses pre-trained deep convolution neural network (CNN) to extract rich features from tanning leather image and Support Vector Machine (SVM) to classify the features into several types of leather. Performance evaluation shows that the proposed method can classify various types of leather with good accuracy and superior to other state-of-the-art leather classification method in terms of accuracy and computational time. Keywords—Leather classification; tanning leather; convolution neural network (CNN); deep learning; support vector machine (SVM)


I. INTRODUCTION
Small and medium sized industries lately have experienced a rapid growth. One of them is leather craft industry which produces various kinds of leather craft item, such as gloves, wallet, belt, sandals, shoes, jacket, and bag. The growth of leather craft industry also affects leather tanning factories to increase their production in order to meet the demand of leather craft materials. To achieve large scale production, an automation system should be implemented in the leather tanning factories.
The automation system involves automatic grading based on type and quality of leather. Type of leather usually distinguished based on color and texture using global and/or local statistical geometrical features with machine learning approach [1], [2]. Quality of leather is mainly determined by the size and location of leather defect. Leather defect can be categorized into five types: lines, holes, stains, wears, and knots [3]. To locate the defects, researchers use morphological operation [4], [5], clustering [6], [7], or machine learning approach [3], [8]- [10].
Leather craft materials usually come from cow, crocodile, lizard, buffalo, goat, sheep, and stingray skin. The animal skin will go through tanning process before it can be used as crafting materials. In every level of tanning process, tanning agent will alter the physical properties and chemical compositions of the skin. The skin will become durable, pliant, and may have different color and texture from the original. Therefore, after tanning process, animal skins will have a large variety of color and texture within the same skin category and have high similarity with other skin categories. This make them difficult to be distinguished. Furthermore, skin from different part of animal body may have different color and texture.
The result of classification determines the grade of tanned leather. The grade of leather will affect the price of leather. Therefore, classification procedure is the most important in automation system of tanning leather production because it is directly affects the price of final tanning leather products. Furthermore, a high return rate and disputes between customer and manufacturing industry which caused by failure in classification of leather usually cause additional costs [4].
From leather craftsmen point of view, the correct result of leather grading is important because it will be used as consideration to determine the type of leather craft product that will be made. Mistakes in determining the type of leather for leather craft products can make the resulting leather craft products become unfavorable and impact on loss in sales. Leather craftsmen in some area do not yet have the knowledge about feasible standard of leather crafting. The type and quality of leather that will be used as leather craft materials are known based on experience and tradition which inherited from generation to generation. Therefore, this research proposes a method to classify type of leather on tanning leather image and performs performance comparison to evaluate the method. In summary, the contributions of this work are given as follows: 1) The proposed method uses tanning leather images as input because tanning leathers are the final product of leather tanning factory and will become leather craft materials for leather craft industry. Therefore, the proposed method can be used and will benefit both of tanning leather factory and leather craft industry.
2) The features of tanning leather images are extracted using specific layer from pre-trained deep convolutional neural (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 9, No. 1, 2018 213 | P a g e www.ijacsa.thesai.org network (CNN). As the layer goes deeper, the richer features will be extracted from the tanning leather images.
3) Classification is done using linear Support Vector Machine (SVM). SVM performs well in few training data, easy to configure the parameters, and has potential to perform real time classification. 4) Finally, performance of the proposed method is compared with the other state-of-the-art leather classification method.
The rest of this paper is organized as follow: Section 2 presents the related works, Section 3 presents the proposed leather classification method, Section 4 presents the results and discussion, and the conclusion of this work is described in Section 5.

II. RELATED WORKS
Leather classification method has been proposed and developed by many researchers. Most of them focus on making an automation system to detect lather defects [3]- [10]. Leather defects inspection system that has purpose to detect the size and location of defects on leather surface is one of the characteristic to determine the quality of leather along with other characteristics such as the type of leather and the correlation between usable and unusable areas on leather. In this research, authors focus on classifying type of leather as it is also an important aspect that affect the quality of leather.
In the tanning leather classification, the difficulties come from large variety and similarity of color and texture of tanning leather. Researchers use statistical geometrical features with machine learning approach to classify the type of leather. In [1], an improved statistical geometrical features (ISGF) is proposed to classify chrome tanning and vegetable tanning leather. They use two classifiers based on Fisher criterion and Learning Vector Quantization (LVQ) network to compare the chrome tanning and vegetable tanning leather. The result shows that ISGF outperforms the performance of SGF.
In their research, [2] use both global and local features to classify leather. The global features are extracted from the two dimensional power spectrum of enhanced leather trench image. The local features are based on mathematical morphology operation of segmented leather image. The features then compared with set of training image that already classified by the experts.
In this research, authors use rich features obtained from pre-trained CNN. The rich features obtained from pre-trained CNN also have been used in some classification problems [11]- [14]. The features are classified using SVM into several types of leather. Authors perform the same procedure and use the same training data for the other leather classification and compared its performance. The other leather classification uses hand-crafted features which are color moments and some statistical measurements from Gray Level Co-Occurrence Matrix (GLCM). GLCM is well known texture features extraction method and has been used in many texture based classification in wide range of applications [15]- [18]. The features are also classified using SVM.

III. METHODOLOGY
This research proposes a method to classify the type of leather to be used in the quality measurement of leather craft materials. General procedure of the proposed method is shown in Fig. 1. From Fig. 1, tanning leather image is resized to fit the input of pre-trained CNN. Authors use the seventh layer of AlexNet [19] which originally trained for ImageNet challenge [20] to extract the features. The features will be classified using linear SVM into five types of leather: monitor lizard, crocodile, sheep, goat, and cow.

A. Tanning Leather Image
Tanning is a process of making leather from the skins of animals. Tanning agents can alter the physical properties and chemical compositions of the leather. The result of tanning is more durable, pliant, and texturally practical material. There are three methods of tanning leather: chroming, vegetable, and combination. Each tanning method produces different materials, aesthetically and texturally e.g. chrome tanned leather is soft and oiled type while vegetable tanned leather is sturdy and hefty. Because tanning agents can change the color and texture of leather surface, the leather images acquired from the industry have large variety even from the same animal. Additionally, the image can be damaged by the external noises.

B. Hand-Crafted Feature Extraction
The hand-crafted leather classification usually based on color and texture. The most common statistical color features are color moments and for statistical texture features are Gray Level Co-Occurrence Matrix (GLCM).

1) Color features extraction:
Color features can be extracted using color moments. Color moments are characteristic measurement of color distribution from image. Color moments consist of mean, standard deviation, skewness, kurtosis, and other higher order moments. Color moments are scale and rotation invariant. Color moments that used in this research are explained as follows: a) Mean is the average value of color on image. If is the number of pixel on image and is the i th pixel on image then mean can be calculated using (1).
Standard deviation is the square root of the variance. The standard deviation can be calculated using (2). www.ijacsa.thesai.org c) Skewness is a measure of the asymmetry of the data around the sample mean. If skewness is negative, the data are spread out more to the left of the mean than to the right. If skewness is positive, the data are spread out more to the right. The skewness can be calcualted using (3).
Kurtosis is a measure of how the outlier-prone a distribution. Kurtosis can be calculated using (4).
2) Texture features extraction: Texture features can be extracted from Gray Level Co-Occurrence Matrix (GLCM). GLCM uses spatial correlation between pixels. Feature extraction using GLCM is done by measuring the occurrence level of paired pixel with specific value on the image then calculate the statistical texture features [21]. The features can be measured from four different orientations or offsets which are and . Some of the statistical measurements of GLCM that used in this research are explained as follows: a) Contrast is a measure of the intensity between a pixel and its neighbor over the whole image. Contrast is used to measure the local variance level in GLCM matrix. If is the GLCM in coordinate , then contrast can be calculated using (5).
Correlation is a measure of how correlated a pixel is to its neighbor over the whole image. Correlation is used to measure the occurrence of paired pixels in GLCM. If ∑ and ∑ is the mean of GLCM and , and √ ∑ and √∑ ( ) is the standard deviation of GLCM and then correlation can be calculated using (6).
c) Energy is the sum of squared elements in the GLCM and can be calculated using (7).
d) Homogeneity is a value that measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal and can be calculated using (8).
e) Entropy is a measure of non-uniformity and texture complexity of image. Entropy can be calculated using (9).

C. Pre-Trained CNN for Feature Extractor
Convolutional Neural Network (CNN) is a powerful machine learning technique. In the deep learning field, CNNs are trained using large collections of diverse images. From these large collections, CNNs can learn rich feature representations for a wide range of images. CNNs have many layers namely input layer, convolutional layers, Rectified Linear Unit (ReLU) layers, cross channels normalization layers, average pooling layers, max pooling layers, fully connected layers, dropout layers, softmax layers, and output classification layers. Each layer in the networks takes in data from the previous layer, transforms the data, and passes the data on the next layer. The network will learn directly from the data and increases the complexity and detail of what it is learning from layer to layer. The function of some layers are explained as follows: a) Convolutional layer puts the input images through a set of convolutional filters, each of which activates certain features from the images. b) Pooling layer simplifies the output by performing nonlinear downsampling, reducing the number of parameters that the network needs to learn. c) ReLU layer allows faster and more effective training by mapping negative values to zero and maintaining positive values. d) Fully connected layer has outputs a vector of K dimensions where K is the number of classes that the network will be able to predict. This vector contains the probabilities for each class of any image being classified.
e) The final layer of the CNN architecture uses a softmax function to provide the classification output.
In order to leverage the power of CNNs without investing time and effort into training is to use a pre-trained CNN as a feature extractor. Layer that will be used as feature extractor is the fully connected layer that extract richer features compared to the lower layer. Most of the models have been trained on the ImageNet dataset [20], which has 1000 object categories and 1.2 million training images. One of the model is AlexNet [19] which published in 2012. It can classify images into 1000 different categories, including keyboards, computer mice, pencils, and other office equipment, as well as various breeds of dogs, cats, horses, and other animals. The architecture of AlexNet is shown in Fig. 2

D. Tanning Leather Image Classification
Support vector machine (SVM) is used to classify tanning leather image from set of features extracted using pre-trained CNN. SVM is a classifier that determined by separation which called hyperplane. Hyperplane can be calculated by maximizing margin or distance from two set of object from two different class. Classification with SVM consist of training and classification. In the training step [22], if there are given training data in two classes and label such that , it can be solved with (10).
where is weight vector, is bias, is slack variables, maps into higher dimensional space and C > 0 is the regularization parameter and the decision function is shown in (11).
where ( ) is the kernel function. After training process, parameter , label names, support vectors, and kernel parameter saved as output model from training SVM. In classification, voting strategy is performed for each data x which will be designated to be in a class with the maximum votes [22]. Optimal parameter is selected using kfold cross validation which is a method to do cross validation by dividing training data into k set which has (k-1) as training data and the rest will be the test data. After training process we will get variable w, x, and b for each class, then the classification process can be done with these steps: 2) Calculate decision function using (11).
3) Repeat step 1 and 2 for other classes. 4) Determine the class by function which gives the most maximum result.

E. Performance Evaluation
The evaluation procedure uses confusion matrix then calculates the accuracy, specificity, sensitivity, and precision using (12), (13), (14), and (15) respectively where TP is true positive, FP is false positive, TN is true negative, and FN is false negative. In order to handle imbalance dataset between positive and negative test data, authors use ratio as weight in the calculation of statistical performance measurement to balance the influence of the dataset.

IV. RESULT AND DISCUSSION
The proposed method runs on Laptop with Intel i7 processor, 16GB of RAM, and graphics card NVIDIA GTX 1050 4GB. Performance evaluation is done using MATLAB.
The proposed method will be tested to classify leather into five types: monitor lizard, crocodile, sheep, goat, and cow skin.
Training data consist of 1000 leather images i.e. 200 leather images for each category. Authors conduct performance evaluation test on 3157 leather images. Input tanning leather images are taken using camera with 15-50 cm distances from leather objects and saved into images with size 512x512 pixels. Fig. 3 shows the samples of leather image used in this research. From Fig. 3, tanning leathers may have different texture within the same category which caused by tanning process or the skins are taken from different part of the animal body e.g. reptiles have different color and texture of skin in their belly and back.

A. Result of Hand-Crafted Feature Extraction
For the hand-crafted feature extraction, input images are converted into grayscale. Authors extract 24 global features from input image, consist of 4 features from color moments namely mean, standard deviation, skewness, and kurtosis and the next 20 features come from statistical texture measurement of GLCM in 4 offsets, namely, contrast, energy, correlation, homogeneity, and entropy. Features are standardized then classified using SVM. Confusion matrix and statistical measurement per category for this hand-crafted leather classification scheme is shown in Table I. From Table I,

B. Result of Feature Extraction using Pre-Trained CNN
For the feature extraction using pre-trained CNN, input image is resized to 227x227 to fit CNN input layer. Authors use the eighth layer of AlexNet which is the last fully connected layers before output as activation function to extract features in order to gain rich features for classification. Another reason is to make the features compact because the eighth layer produces 1000 features and the other fully connected layers produce 4096 features. Features are classified using SVM. Confusion matrix and statistical measurement per category for leather classification scheme using pre-trained CNN is shown in Table II. The statistical measurements per category in Tables I and II are averaged to calculate the whole performance of the proposed method. Authors also measure the computational time and compare the result from both classification schemes. From Table III, the hand-crafted features extraction scheme produces lower performance than pre-trained CNN. The computational time for hand-crafted features is also longer than pre-trained CNN despite only use small number of features. In the hand-crafted feature extraction, input image needs to be transformed into GLCM matrix before calculating the statistical texture features while in the feature extraction using pre-trained CNN, the features are directly extracted from input image with activation function which are the weights from specific fully connected layer.
The outstanding classification performance of the proposed method mainly caused by utilizing pre-trained CNN in this case AlexNet that designed for harder classification problem with 1000 categories of objects and trained using 1.2 millions of images. Therefore, for smaller and simpler five categories classification task can produce good performance when using pre-trained CNN as feature extractor.

V. CONCLUSION
In this research, a method to classify tanning leather image is proposed. The classification process is used to ensure the quality of leather craft materials produced by leather tanning factories and used by leather craft industries by recognizing the type of animal skin. The method uses tanning leather image as input then perform feature extraction using pre-trained CNN. The classification is done using SVM into five leather categories. The performance evaluation shows that the proposed method can classify leather with good accuracy and precision. The proposed method also superior to hand-crafted classification scheme. For future works, authors aim to add more types of leather to be classified, improve the classification performance, speed-up the computational time, and implement the method in mobile application.