Automatic Tariff Classification System using Deep Learning

—The tariff fraction is the universal form of identi- fying a product. It is very useful because it helps to know the tariff that the product must pay when entering or leaving the country, in this case Mexico. Coffee is a complicated product to identify correctly due to its variants, which at first glance are not distinguishable, which can cause confusion and the tariff to be charged incorrectly. Therefore, the main objective of this project was to develop a system based on Deep Learning models, which allow to identify the tariff code of coffee to import or export this product through the analysis of digital images in real time, generating automatically a general report with this information for the customs broker. The developed system allows speeding up the process of assigning the tariff fraction, and also allows the correct assignment of the tariff fraction, avoiding confusion with other products and the wrong collection of the tariff. It is important to mention that the system, although for the moment it is focused on the country of Mexico, can be used in all customs offices since the tariff fraction is universal. The evaluation of the models was carried out with cross-validation, obtaining an effectiveness of more than 80%, and the tariff fraction assignment model had an effectiveness of 90%.


I. INTRODUCTION
The Harmonized System (HS-code) of Tariff Nomenclature created by the World Customs Organization is widely applied to standardize the exchange of internationally traded goods. The code consists of six digits in general for all countries and in tariff fractions. The tariff fraction is an eight-digit code that represents a good within the Tariff of the General Import and Export Tax (TIGIE). Based on the assignment of the code, the tariff regulations that must be satisfied for the import or export of the goods are established. However, incorrect assignment of the HS-code can result not only in non-compliance with tariff regulations and restrictions, but can also lead to fines, infractions to customs agents or even seizure of the goods [1]. Among the products that are difficult to classify correctly for HS-codes assignment are coffee beans due to the varieties that exist and the different degrees of roasting. In general, coffee beans can be classified into four varieties: Arabica, Excelsa, Liberica and Robusta [2]. However, the classification process for unfamiliar people can be a very complicated task, even for an expert in the domain. This is explained by the fact that the shape and color of the different varieties look similar upon visual inspection. In the field of computational learning, different models have been evaluated to address the automatic classification of coffee beans [3], [4], [5]. This is especially useful for the correct assignment of HS-codes to avoid the negative consequences mentioned above. For this research, a modified basic deep learning architecture [6] was used, in which two layers of feature detection were incorporated. Two databases were integrated, first one consist of 200 images, 50 for each of the main coffee bean varieties: Arabica, Excelsa, Liberica, and Robusta. The second one, include 60 images for three categories of roasted: Not Roasted, Roasted and Dark Roasted. The model built was incorporated into a web-based system to assist in the process of correctly assigning tariff fractions. The performance achieved was 90% accuracy.
The paper is organized as follows: Section II provides a review of studies related to the application of machine learning models for HS-codes assignment as well as works that use this type of models to classify coffee beans with different purposes. Section III describes in detail the methodological strategy applied and the results obtained, as well as their discussion. In the final part, conclusions and future work are presented.

II. RELATED WORK
In the international coffee marketing industry, several studies have been developed to improve the general correct assignment of HS-codes, identification of defects in the coffee beans and also correct class assignment of coffee beans.
In 2015 a paper was published related to the trading system for Singapore Customs, which was based on classifying accurately products and assigning HS-codes based on the text description of the declarations. The technique used was a Background Net. The actual transaction dataset after the pre-processing stage consisted of 40,861 records from chapter 22 and 83,830 records from chapter 90 according to the HS-code system. The data were split 60/40 for the training and testing phases, respectively. The results showed that the model employed had an accuracy greater than 90% for chapter 22 but significantly low for chapter 90 data with accuracy values of around 70% [7]. In another research, several machine learning models were explored to predict HScode based on commodity descriptions entered by customers. The study followed the cross-industry data mining process methodology. The linear support vector machine model was able to achieve the highest accuracy of 76.3%. The dataset was provided by Dubai Customs through an Artificial Intelligence (AI) hackathon competition held in October 2019. This data consisted of 22,346,194 records where each record had two attributes; the Harmonized System Code (HS-Code) and the description of the user inputs. The machine learning models applied were: Naïve Bayes, K-Nearest Neighbor, Decision Tree, Random Forest, Linear Support Vector Machine and Adaboost. The authors propose in their study that a hierarchical prediction could be built from the HS-Code header until all the subsections of the HS-Code are identified [8]. The issue of automatic classification of Hormonized System Codes (HScodes) based on the descriptions provided by the users is also addressed by [9]. Three different Deep Learning architectures were evaluated in the experiment: Hierarchical logistic regression, Neural Machine Translator (NMT) and Long Short Tem Memory (LSTM), the latter two with and without hierarchical loss. Thus, 5 models were analyzed. The dataset consisted of eight months of shipments to a country via the DHL network, which included 1,156 million records. However, records of mask/kn95 shipments and blood samples were excluded so that the results would not be biased by the unusual shipment situation caused by the COVID-19 pandemic. The results showed that the NMT model with hierarchical loss obtained the best performance reaching an accuracy of 85%. In [10] is mentioned that deep learning can be defined as a waterfall that performs non-linear processing to learn multiple levels of data representations. In this context, the study by Lee et al. reports the application of a Deep Learning model to assist in the assignment of HS-codes in collaboration with Korea Customs Service. In the experiment 129, 084 cases were evaluated and the top-3 suggestions made by the model achieve an accuracy of 95.5% [11].
There have also been studies have also been carried out on the topic of coffee bean classification. In [3], measurements in the CIE L*a*b color space are used to characterize green Arabica coffee beans provided by growers in the State of Minas Gerais, Brazil. The algorithms used were a Multi Layer Perceptron feed-forward Artificial Neural Network (ANN) and Naïve-Bayes. The authors report an accuracy of 100%. The data set consisted of 20 samples of 50 grams, 30 images per color were obtained from the following groups: offwhite, green, cane green and blue-green. In another study, feed-forward back propagation neural network and K-nearest neighbors algorithms were compared to classify coffee beans from different villages in Cavite, Philippines. The varieties included were Robusta, Excelsa and Liberica. The dataset consisted of 255 images, which were divided into 195 samples for training and 60 for the testing phase. Four morphology characteristics were used in the experiments: area, perimeter, equivalent diameter and percent roundness. The accuracy achieved was 96.66% with ANN and 82.56% with k= 4 for KNN [4]. In [5], 255 samples from the province of Cavite, Philippines were used. The data set included 85 samples per species, considering three species of green coffee: Robusta, Liberica and Excelsa. The images were converted to grayscale to extract morphological characteristics. The experiment included 22 classifier algorithms from 5 families: Decision Trees, Discriminant Analysis, Support Vector Machine, K-Nearest Neighbor and Ensembles. The results showed that Coarse Tree algorithm achieved the best result with 94.1% accuracy. Although in a previous work of the author an accuracy of 96.6% was obtained, it is explained that the Coarse Tree algorithm was faster in the time required for the training phase. Other research aimed to detect and classify Luwak coffee green beans purity into the following purity categories, very low (0-25%), low (25-50%), medium (50-75%), and high (75-100%). The research compared the performance of four pre-trained convolutional neural network (CNN) models: SqueezeNet, GoogLeNet, ResNet-50, and AlexNet. GoogLeNet obtained the best result in training and validation steps and achieve an acuraccy of 89.65% [12]. In the study of Wallelign et al., the images of coffee bean samples were collected at Jimma Grading Center in Ethiopia, the beans from the 300 gram sample used for raw evaluation were used to prepare the dataset consisting of 1266 images of coffee beans from 12 quality grades. The dataset was divided into three sets for training (70%), validation (10%) and testing (20%) and accuracy of 89.1% was achieved on the test dataset [13]. In [14], 74 coffee beans of different origins were analyzed and separated into two species: Arabica and Robusta based on their fatty acid composition. The study was based on a Deep Learning approach using a conversion of the raw data into Z-score format. The authors comment that different types of conversion can affect the classification results of coffee species. Statistical analysis and Linear Discriminant Analysis were also used to extract robust features that influenced the Machine Learning process. The author [15] mentions that his article determines the species of coffee bean using the GLCM model (Gray Level Co-Occurrence Matrix) with the help of artificial neural networks. What this author does is create a new method to determine the different species of coffee beans: Arabica, Excelsa and Robusta. 120 images were used for training and 60 images for testing. This author concluded that image processing is effective in determining the quality of coffee bean variants, the ANN classifier had an experimental accuracy of 97.06% which shows that ANN classifier is reliable to classify which species of coffee beans.
On the other hand, several researches have been reported different methods to identification of defects in the coffee beans, for example, he work of Akbar and collaborators was based on the measurement of the quality of green Arabica coffee to classify the beans into five different levels of defects. The extraction of color and texture features of coffee beans was done through color histogram and local binary pattern. The machine learning techniques used were Random Forest and K-Nearest Neighbors, which achieved accuracies of 87.87% and 80.47%, respectively. The experiment included a balanced data set with a total of 900 RGB images for the five classes (defect levels). The data were divided in a 66/33 ratio for the training and testing stages. Authors mentioned that blurred images generated inconsistent values in feature extraction and could lead to incorrect predictions in the classification algorithms [16]. In [17], four models off CNN from Keras Framework were evaluated. The coffee roasting frames were divided into three classes that are Not Yet, Accepted, and Rejected, following the development of coffee bean during the roasting process. The dataset consisted of 4,464 images in Not Yet Class, 3,168 in Accepted, and 3,312 in Rejected. The dataset prepared for deep learning model training had a total of 10,944 images and they were processed into 60% training, 20% validation, and 20% test. The deep learning model training final result showed accuracy >97%. Other research, also focused on the development of more efficient methods to select coffee beans based on shape and particularly, on color descriptors achieved values >88% of accuracy. The techniques applied were Support Vector Machine (SVM), Deep Neural Network (DNN) and Random Forest (RF), to assess coffee beans' defects. Images of each class were taken separately, accounting for a total number of 635 samples (coffee beans) [18]. In other work, an AlexNet-based deep learning model was applied to detect eight types of defects in coffee beans with an accuracy ratio of 95.1%. The original dataset included 3261 samples but by rotation and data augmentation, 7203 images were acquired because the initial set contained few samples with defects [19]. Chou and collaborators proposes a model based on Deep Learning to inspect coffee beans and a Generative-adversarial network (GAN) for Structured Data Augmentation. The proposal aims to contribute to intelligent agriculture with the application of these techniques to remove beans with defects categorized by the SCAA (Specialty Coffee Association of America) and minimize human effort in the process of labeling coffee beans. The model reports an accuracy of over 80% [20]. Previous studies have shown the availability of image processing and machine learning techniques, in [21] is reported the application of techniques such as Convolutional neural networks (CNNs), Support vector machines (SVMs), and k-Nearest-neighbors (KNNs), for the classification of peaberries and normal beans. The separation of peaberries and normal coffee beans increases the value of both peaberries and normal coffee beans in the market. According to the authors, the combination of the CNN and Raspberry Pi 3 holds the promise of inexpensive peaberries and a normal bean sorting system for developing countries. The trained CNN could classify approximately 13.77 coffee bean images per second with 98.19% accuracy of the classification. In [22], an intelligent coffee bean quality inspection system based on deep learning (DL) and computer vision (CV) was developed to assist operators in detecting defects, including mold, fermentation, insect bites, and crushed beans. An opensource dataset of coffee bean images was used for testing. The dataset contains 4626 images of green coffee beans under the same light source, of the total there are 2150 and 2476 images of good and defective beans, respectively. In the experiment, the dataset was divided into a training set of 4000 images and a testing set of 626 images. The accuracy of the ResNet-18 model reached about 93% in the testing set. The student model achieve an accuracy of 79%. The author [23] makes mention in his research that he develops a system which, through convolutional neural networks, is capable of detecting defective coffee beans, for example beans that have some type of imperfection, this author also makes emphasis on that they performed five classifications of defective and non-defective beans better performance with CNN was obtained in all types of defects classification precision was more than 90%.
Finally, we identified some research focused in determine other kind of characteristic of the coffee, flavor and quality, among we can cited the follow: In [24] they mention that the aim of other research is to investigate the feasibility to train machine learning (ML) and deep learning (DL) models for predicting the flavors of specialty coffee using near-infrared spectra of ground coffee as the input. The authors mentioned that effective models provided moderate prediction for seven flavor categories based on 266 samples. The Machine Learning methods applied were Support vector machine and the Deep convolutional neural network (DCNN), which achieved similar performance, with the recall and accuracy being 70-73% and 75-77% respectively . Other study describe a method to classify the geographic origin of coffee beans, comparing popular machine learning methods, including convolutional neural net-work (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. Principal component analysis (PCA) and Genetic algorithm (GA) were applied for LDA and SVM to reduce dimensionality. Ninetysix samples of Arabica coffee beans, representatives of three different geographical origins, were analyzed in the study. The results shown an accuracy of 90% in a prediction set achieved using a CNN method [25]. The author [26] mentions that the evaluation of the color of green coffee beans is an important process to define their quality and price in the market and that this process is normally carried out by means of a visual inspection, this causes some limitations. To solve this, they carry out a system capable of obtaining CIE measurements (Commission Internationale de l'Eclairage) of the coffee beans and in turn classify the beans of this product according to their color. Artificial Neural Networks (ANN) were used as the transformation model and the Bayes classifier was used to classify the coffee beans into four groups: whitish, cane green, green, and bluish-green. The neural networks models achieved a generalization error of 1.15% and the Bayesian classifier was able to classify all samples into their expected classes (100% accuracy).
The review shows that some of the related studies use machine learning models applied to the assignment of HScodes but not precisely to classify coffee beans. In some cases, the accuracy is not higher than 70%. Experiments using Deep Learning techniques or hierarchical models are also reported, some even considering the top three classification suggestions to increase their performance. However, they either focus on coffee purity categories or defect levels for some variety of coffee beans, or they address only on the roasting process. On the other hand, they have approaches towards intelligent agriculture or the discrimination of peaberries and normal coffee beans, distinction of coffee flavors or identification of the geographical origin of the beans. It is necessary to mention that although one of the works reported 100% accuracy, it is possible that this was due to overfitting because only 20 samples were used. So, we propose the use a CNN model to resolve the problem of this research, classify the variety of coffee beans from digital image analysis. Some researches that report imaging classification from Deep Learning models are: [27] in where the authors make mention that the use of computer vision tools have had a strong impact on the industry given its varied applications and its ability to automate complex and demanding processes. One example of the applications of this kind of tools is the recommendation of clothing in online purchases, up to the characterization and generation of clothing statistics in physical stores. In [28], product recognition can be perceived as a particular research issue related to object detection. However, its application of product image recognition is still less perfect. Thereby, the relevance of our research is to propose a CNN model considering the assignment of tariff fraction for four varieties of coffee beans. There are two important issues to consider in the application of deep learning models. Deep learning algorithms have been used mainly in applications where the data sets were balanced or, as a workaround, in which synthetic data was added to achieve equity. Another concern is that deep learning relies predominantly on large amounts of training data. On the other hand, it is worth noting that the rise of deep learning has been strongly supported by major IT companies (e.g. Google, Facebook, and Baidu) who own a large number of patents in the field and major companies are backed substantially for data collection and processing.
Based on the above, the following differences of this work with those previously described can be highlighted: • We do not seek to identify imperfections, which focuses on identifying shape features, but the work of [23] gives evidence that it is possible to extract features from images of coffee beans.
• The work of [26], it focuses on the distinction of colors identifiable with the naked eye, which does not happen when images of already toasted coffee beans are analyzed, which is common in a process of importing or exporting coffee in some country, that is, our problem is not reduced to distinguishing different categories of maturation of a coffee bean but to distinguish the variety of coffee from the analysis of a bean with a specific maturation.
• In [15] use an ANN, in our case you propose to take advantage of the advantages provided by the use of a CNN, in addition to including one more kind of coffee.

III. RESULTS
The principal results of this research are reported in four parts, the data set of coffee beans image, the model to identify the variety of coffee beans, the model to determine the tariff fraction and the system to merge the models and generate the documentation related to specific variety of coffee identified.

A. Dataset of Coffee Beans Images
We searched images of coffee beans from several sources of Internet where it specifies the variety showed, and we also obtained real images of coffee beans that we obtained from a coffee beans shop, and then we built two data sets:

B. Deep Learning Model for Coffee Bean Variety Classification
Deep learning (DL) is a subset of a larger family of data representation-based machine learning algorithms. The first DL model assigns the variety of coffee that corresponds to a certain image of coffee beans, which can be: Arabica, Excelsa, Liberica or Robusta. For this model, as for the one in the following section, we used the basic deep learning CNN architecture [6] with two feature detection layer. The base model is presented in Fig. 1, which considers input images of 150x150 pixels. The first convolution layer expands the information to 32 levels deep for convolutions, pooling reduces the visual information to a quarter of its size (75 × 75 pixels). 6x6 size filters were used. In the next stage, the information for convolution was duplicated, although with a smaller filter (4x4). Pooling was applied again, reducing the information from the previous layer to a quarter of its size (37×37 pixels). After that, the flattening layer was applied and the information was reduced to a vector of size 256. Finally, a Softmax layer was applied to describe the probabilities for each of the four classes (variety of coffee) to be predicted. Some changes were made due to the fact that satisfactory results were not obtained with the base architecture. Two layers were modified, one of convolution and one of reduction, in the same way an Adam optimizer was chosen, which is within the classification layer, this optimizer was chosen since it has a more effective error reduction. More details of the layers of the architecture used for the developed deep learning model are illustrated in Table I. In more general terms, the first convolution layer will process images from coffe beans with a certain height and length (it will be resized to 150x150 px), and this layer will allow us to detect basic characteristics such as curves or lines, basic textures and so on from the images. For the first reduction layer we say that it will have a maxpooling and a certain pool size, this layer reduces the first convolution layer in order to gain some scale invariance in the input images. For the second convolution layer it is basically the same as the first one, except that in this image we will no longer assign a height, length, nor will it have any activation function. For the second layer of reduction we say that it will have a maxpooling and a 37 × 37 size of pool, to again, gain some extra invariance. This layer reduces the second layer of convolution. The classification layer contains a property which makes the image, which is now very deep and very small, will now become one-dimensional that contains all the information of our neural network, it also contains 256 neurons and the ReLu activation function, it does a dropout process which makes 50% of these neurons activate at each step (number of times the information is processed), this is done so that it not only learns a specific way to classify coffee and can adapt to new information (some kind of overfitting prevention) and the number of coffee classes which the model will contain in this case are four. Finally, this model in order to classify the variety of coffee beans required 22467300 parameters. In the two convolutions the number of parameters was 3488 and 32832 respectively. Once the network is flattened, the number of parameters to learn is 22429952. In the last classification layer, only 1028 parameters are required in order to detect four classes. A decimation function was performed to fit the images to the model input of 150×150×3 pixels.  Tables I and II.

C. Deep Learning Model for the Identification of Tariff Fraction
The goal of this model is to classify the coffee into the following classes: Not Roasted, Roasted and Dark Roasted, once the coffee is classified within any of these, the model will assign a tariff fraction which is obtained from a database and it will generate detailed information of the product. In contrast as the previous model, the base architecture of CNN model was used, like show the Table II (note that, it is very similar to the corresponding architecture presented in Fig. 1). This is because, in this case, the classification focuses on color identification. In this second model, only the first convolution layer changes, reducing its number of parameters from the first convolution to 1568 (the first contains 3488). The final number of parameters therefore in this model was 22465380 parameters. These 22 million parameters were trained using a traditional Adam optimizer based on the Tensorflow library in the Python programming language. Since the training images consisted of a reduced dataset (due to the few examples found in the literature, even including our own examples), the model was not expensive to train locally. The effectiveness of the model (Section III-D) shows the results using a crossvalidation technique, which could be done efficiently due to the few image examples mentioned above.

D. Models Effectiveness Report
To validate the performance of the models we used 10 cross-validation, because the size of datasets, thereby we used 80% of data set to train the model and 20% for validation; this was applied to both models.
The obtained results (averages) are showed in Tables III  and IV, performance of the model to identify the coffee class and performance of the model to identify the tariff fraction, respectively. The tables present precision, recall and f1-score measures, as well as the micro, macro and weighted measures. The results of the Table III show that the model has a greater error when identifying the Arabica class, this is probably due to the fact that it has too much similarity, visually, to the Excelsa class. Fig. 2 shows the confusion matrix obtained for the results in Table III. The matrix shows more clearly the errors between the Arabica and Excelsa classes. On the other hand, due to we have balanced classes, the measures micro, macro and weighted were similar, and we can take the micro measure as the Global Accuracy of the model. Thereby, the model have a global accuracy greater than 80% and the performance measured of each class show that the model is capable to distinguish the coffee beans class with good performance despite of visually them have the same color and shape. The results of the Table IV show that the model has a greater error when identifying the Roasted class, according to the performance measures for class, this is probably due to the fact that the roast sometimes has an identical tone to the dark roast and therefore the model becomes confused. Fig. 4 shows the confusion matrix for the results of Table IV. The difficulty in recognizing coffee beans with "roasted" and "dark roasted" levels can be seen more clearly. In the same way that the other of our models, we can consider the micro measure like the global accuracy, 90%. This result is comparable to the similar work of [26] where their objective is to identify different types of coffee beans according the color of them, and they reported a global performance of 97.06%.

E. Web System
Finally, a web system was built, where the created Deep learning models were integrated, this system automatically assigns the tariff fraction of coffee beans from the analysis of a digital image, that image is entered into the first model and this determines the variety of coffee, then the second model determines if the coffee beans are roasted, dark roasted or not roasted in order to assign the corresponding tariff fraction. Finally, a document is automatically generated which contains the image uploaded by the user and the tariff item broken down in detail. The Fig. 3 shows images of the interfaces system and the general process to use it.

F. Discussion
This article presented the structure and model of a system that allows determining types of coffee beans as well as the roasting level in order to assign the tariff fraction as support in customs and, therefore, usable by customs brokers. It was noted that there is little information on visual datasets about coffee beans. Mostly concrete examples exist but not many image examples. This implied an interest in generating a small compendium of images generated by the authors of this work that allowed strengthening the results of the computer vision models presented. Computer vision models are based on Deep Learning techniques that, unlike the state of the art, present a lighter architecture in order to achieve finer recognition of coffee beans patterns that are usually difficult to identify with the naked eye. With the accuracy obtained from the models, the results were used to estimate the tariff fraction. These results, taken to a web system, facilitate the manual task of estimating said tariff fraction by people who are not familiar with agronomy and, in particular, with coffee beans varieties, or their roasting level. In addition, even when the main goal for this project is to facilitate the process of entering and leaving coffee from our country, since the system is web-based, it can be used ubiquitously by a diverse number of users, requiring only an Internet connection.

IV. CONCLUSION AND FUTURE WORK
We have were present a novel system to automatically distinguish the variety of coffee from the analysis of a bean with a specific maturation, Arabica, Excelsa, Liberica, and Robusta, with a accuracy gather than of 80%, that is relevant in a process of importing or exporting coffee in some country, which helps to avoid the negative consequences of incorrect classification. Which is one of the main differences with similar works, given that these works only report the development of machine learning models but do not report their implementation in any tool that is easy to use by end users.
So, we described the deep learning models built and the accuracy of them, 84.1% for model for coffee bean variety classification and 90% by model the identification of tariff fraction respectively. These results are comparable with research in the state of the art. In this way, this results support the feasibility of the application of these model to the tariff fraction assign process.
The base architecture used for the development of Deep Learning models was very useful since not many changes had to be made for model classify of coffee, and no changes were done for the tariff fraction assign model. The assessment made by end users was good and they consider that the system meets the needs of customs agents. In addition to the fact that the system is fast and innovative, so we consider that we not only obtained Deep Learning models, but we integrated them to a web system for use them in a simple way.
Finally, and as a future work, we consider to improve the performance of both model, but specifically the classify of coffee model, first increasing the size of the datasets, and second, carrying out other changes in the CNN architecture base used, for example, add a third convolutional layer, or even to probe other architectures like LeNet or Alexnet.