A CNN based Approach for Handwritten Character Identification of Telugu Guninthalu using Various Optimizers

—Handwritten character recognition is the most critical and challenging area of research in image processing. A computer's ability to detect handwriting input from various original sources, such as paper documents, images, touch screens, and other online and offline devices, may be classified as this recognition. Identifying handwriting in Indian languages like Hindi, Tamil, Telugu, and Kannada has gotten less attention than in other languages like English and Asian dialects like Japanese and Chinese. Adaptive Moment Estimation (ADAM), Root Mean Square Propagation (RMSProp) and Stochastic Gradient Descent (SGD) optimization methods employed in a Convolution Neural Network (CNN) have produced good recognition, accuracy, and training and classification times for Telugu handwritten character recognition. It's possible to overcome the limitations of classic machine learning methods using CNN. We used numerous handwritten Telugu guninthalu as input to construct our own data set used in our proposed model. Comparatively, the RMSprop optimizer outperforms ADAM and SGD optimizer by 94.26%.


I. INTRODUCTION
In today's world, the internet is brimming with images and video representations, providing sufficient opportunity for building numerous research applications for image and video analysis [1] to educate people about more complex material and techniques. With the rise of Artificial Neural Networks, machine learning has advanced significantly in recent years (ANN). These ideas enhance the model's capabilities beyond machine-learning tasks and other domains. Convolutional neural network (CNN) architecture has been considered as one of the most inventive. Using CNN in image processing became clearer and more beneficial as ANN performance deteriorated in object recognition and image classification. As better CNN became accessible, research using CNN in image processing domains grew dramatically [2][3][4]. CNNs have had a lot of success in various domains, including computer vision, natural language processing, and speech recognition.
One of the most widely used machine learning models is CNN which has been expanded to handle a wide range of visual image applications, item classification, and audio identification challenges by applying mathematical representations. Multi-layer network structure that may be learned and consists of several layers [5]. Raw pixel values may be utilized as input to the network instead of feature vectors, which are often employed in machine learning. Even though there are many different kinds of CNNs ( fig.1), they always have the same basic structure: a convolutional layer, a pooling layer, and an entirely connected layer.

1) Convolution layer:
Images are filtered using this tool, which identifies characteristics that are used to identify matching spots during testing. Enlarged images need a convolution procedure with minimum parameters. With a filter or kernel, the input data is transformed into a feature map for use by CNN.
2) Pooling layer: This layer receives the extracted characteristics. It reduces bigger images while keeping the most critical data. It keeps the maximum value from each window by preserving the best fit value. This function shrinks the picture spatially to minimize the number of parameters and computations in the model. Max Pooling is a typical strategy in pooling. It selects the greatest element from the feature map covered by the filter.
3) Fully connected layer: High-level filtered images are fed in and categorized using labels in the final layer. Every neuron in this layer is related to the one below it. Layers of convolution and pooling are common in most designs. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 13, No. 4, 2022 704 | P a g e www.ijacsa.thesai.org II. LITERATURE REVIEW CNN performs well in various applications, inspiring academics to work on it in key fields such as natural language processing (NLP), image classification and face recognition, predictive analytics, and so on. P.V. Ramana Murthy et al. [6] built a model that recognizes online handwritten Telugu letters for various domains and companies, yielding a model with 98.3 per cent accuracy, exceeding their expectations. P. Sujatha et al. [7] used CNN architecture to identify a few deep learning strategies for detecting Telugu and Hindi scripts. They also developed a new architecture for identifying lowlevel textual properties of handwritten characters. Buddharaju Revathi et al. [8] also conducted a survey on Optical Character Recognition (OCR) for the Telugu language, demonstrating the progress of processing the characters stage by stage and performing operations such as segmentation and processing, which resulted in higher accuracy.
B. Hari Kumar et al. [9] used data from several sources to conduct script identification in Telugu. Konkimalla Chandra Prakash et al. [10] used CNN for Telugu script recognition. They supplied a list of Telugu typefaces, a client-server solution for the algorithm's online deployment, and deep learning-based OCR techniques in their study. The segmentation method can be improved so that each character, along with its gunintham and vattu, is segmented. Chirag I Patel et al. [11] developed a technique for identifying characters in a given digitalized text and reading the changing effects of the Models utilizing ANN by employing a back growing neural network to improve script identification accuracy. A.Ram Bharadwaj et al. [12] built a model for Telugu text extraction and recognition using CNN and a recurrent neural network (RNN) with their own data set, achieving an accuracy of 81% by selecting 100 random words from the validation set.

III. PROPOSED METHOD
In image-based classification, the CNN architecture is a popular choice. To distinguish features like edges and forms, CNNs apply a range of filters composed of trainable parameters to an image. These high-level filters often utilize weights learned from the spatial attributes of each subsequent level to capture an image's spatial properties. For this design, parameters named as Hyper Parameters are employed. These parameters may influence both the network architecture and training before the training begins.
Hyper Parameters used for building the network are:  Count the number of hidden layers: the layers between input and output.
 To prevent overfitting, dropouts are employed.
 The activation function is used to add some nonlinearity.
 Learning Rate: It specifies the rate at which the network's parameters are updated.
 Momentum: It keeps things from oscillations.
 One complete prediction cycle of a neural network is one epoch.
Batch Size: The number of subsamples sent to the network. To speed up CNN model training, all of the images in the build data set were categorized, reduced to 70 by 70 pixels, transformed to greyscale, and saved in .npz format. Telugu Guninthalu dataset was trained using three hidden layers of a CNN and tested on it. The network consists of three convolutional layers, two Maxpooling layers, and a fully connected layer. The first hidden layer is a Convolution2D layer. A Maxpooling layer with a pool size of 22 was then employed, along with 256 filters, each with a kernel size of 33 and a stride of 1. A layer of padding has been added to the design to keep the original input size.
Each of the 256 convolution filters in the second convolution layer has a kernel size of 33, followed by a Maxpooling layer of 22. New features have been added to the algorithm, including an additional 256-filter convolution layer and a Maxpooling layer of pool size 22. The two-dimensional matrix data is first transformed into a one-dimensional vector using a Flatten layer before the fully connected layers are generated. Next, we used a totally corresponding layer with the activation of ReLu in it. Dropout, a regularization layer, is set to randomly eliminate 20% of the layer neurons to prevent overfitting. At long last, a sigmoid activation function is added to the 16-neuron output layer. An essential part of CNN is the activation function; it determines the output of a neuron concerning a set of inputs. The activation function is used to introduce nonlinearity into the model. A CNN model's performance is improved by selecting the appropriate activation function. The ReLu and Sigmoid activation functions are used in the suggested model.

1) Rectified Linear Unit (ReLu):
It's a biologically and mathematically sound activation function [14]. If the input is negative, ReLu returns 0, if the input is positive; it returns the value itself (Eq 1). ReLu's max operation calculates more quickly than other activation functions. For many kinds of neural networks, it is the default activation function.
If we are doing a matrix-vector product, this is often done element by element to get the desired outcome. Nonlinearity is vital for CNN since every layer in the network contributes to it.
2) Sigmoid: Eq 2 indicates that it is a probabilistic strategy for decision-making with a range of 0 to 1. We used this activation function to forecast an output since it would be more accurate.
It's beneficial for models that forecast probability as a result. Because the probability of anything spans between 0 and 1, a differentiable sigmoid function is the best option. As a result, this function yields the curve's slope at any two locations.
Training and validation errors were calculated after each cycle. The training is completed when the number of epochs in training and validation hasn't changed considerably. On the test set, faults in training and validation were discovered, and the network was approximated. It is vital to utilize optimizers in order to improve accuracy and decrease mistakes. Optimizers alter the weight settings in order to minimize the loss function. To get the greatest possible design, we considered a set of priorities or limitations. Adaptive Moment Estimation (Adam), Root Mean Square Propagation (RMSprop) and Stochastic Gradient Descent (SGD) were used in the proposed model.

3) Adaptive Moment Estimation (Adam):
Using Adam, an adaptive learning rate technique, individual learning rates are calculated. The exponentially declining average of the preceding squared gradient is stored, and the value of previous historical gradients is preserved, much as momentum. It may also be used to replace stochastic gradient descent systems when updating the network weights in training data. Equations 3 and 4 are used to derive the gradients, which are then used to estimate the moments using exponentially moving averages. Where m t and v t are moving averages, g is gradient, β1, β2, and second moment of gradients are gradient forgetting features, and index t is the current training iteration.

4) Root Mean Square Propagation (RMSprop):
This approach is similar to a gradient descent with the momentum that confines oscillations to the vertical plane. Thus, increasing the learning rate enables the algorithm to make considerable horizontal jumps toward rapid convergence. The magnitude of recent gradient descents is used to normalize the gradient. Selecting alternative learning rates for each parameter causes the pace of learning to be automatically changed. The parameters are updated using Eq. 5, 6, where gt is the gradient at time t, vt is the exponential average of squares of gradients, and η is the learning rate, which is set at 0.001.

5) Stochastic gradient descent:
Stochastic Gradient Descent (SGD) is a simple yet efficient optimization algorithm used to find the parameters/coefficients of functions that minimize a cost function. In other words, it is used for discriminative learning of linear classifiers under convex loss functions such as SVM and Logistic regression.

A. Data Set
Due to the lack of publicly available training data for Telugu characters, we had to construct our own dataset. There is an initial gathering of people's handwritten Telugu guninthalu in various formats and scanning them into precise symbols. Fig. 3 shows a scanned copy of a handwritten guninthalu both online and offline. There are 16 characters in each of the 21 guninthalu, for a total of 275520 handwritten characters. www.ijacsa.thesai.org

V. REQUIREMENTS
The backend engine for the recommended model is Tensor Flow 2.2. CNN was implemented using the built-in dataset. These tests were run on a 64-bit operating system with an Intel i7-4770 CPU running at 4.00GHz and 16GB of RAM.

A. Results
The suggested model is applied to the pre-existing data set in the system. This research aims to develop a system for recognizing handwritten characters in Telugu, which makes extensive use of classification. Table I displays the model's layer-by-layer attributes. The proposed network will handle a total of 1,986,640 parameters.
Metrics including recall, precision, and accuracy [16] are used to assess the proposed system.         The training, testing accuracy, loss, precision, and recall with Adam optimizer, RMSprop optimizer, and SDG optimizer are shown in Fig. 7 to 11. The results show that the RMSprop optimizer has a higher testing accuracy than the ADAM and SGD optimizers. The mean values acquired with all the guninthalu are used to determine the overall training, testing accuracy, precision, and recall. Tables III to VI shows that when compared to ADAM and SGD optimizers, the RMSprop optimizer performs well on all guninthalu.
There are 820 different handwritten characters for each gunintham used to train the model. Each of the 21 guninthalu contains 16 characters, for a total of 21X16X820 = 275520 handwritten characters, which are used to train the model as shown in Table VII. After training, the model is checked using a 20% data mean and 55104 characters. Accuracy, Precision, Loss, and Recall are used to evaluate the model after it has been evaluated on 55104 data points. www.ijacsa.thesai.org

B. Augmentation
Data augmentation is a method of artificially creating fresh training data from existing data. This is performed by transforming examples from the training data into fresh and unique training examples using domain-specific techniques.

C. Dropout
Dropout is a technique for preventing overfitting in a model. Dropout operates by setting the outgoing edges of hidden units at random. As the augmentation factor is increased, the validation accuracy improves. The test accuracy increased to 90.12 with ADAM optimizer, 94.26 with RMSprop, and 86.72 with SGD optimizer after data augmentation as shown in Fig. 12 to 14.

VI. CONCLUSION
With CNN, our goal is to improve the quality of handwritten Telugu gunintham identification. The contributions of Adam, RMSprop, and SGD were critical in improving the model's accuracy and performance. The dropout and activation functions ReLu and Sigmoid are used to prevent overfitting. In CNN models, RMSprop optimizer outperformed Adam and SGD in terms of accuracy. This model might be adjusted to enhance the identification of handwritten Telugu characters. In the future, we want to continue experimenting with new approaches to improve the identification of Telugu handwritten guninthalu.