Evaluating the Quality of a Person’s Calligraphy using Image Recognition

The problem of not developing good handwriting as a child has serious consequences for learning, these range from training human memory to the capacity for innovation. To assess the quality of a person’s handwriting, it is necessary to process large amounts of images and with current improvements in machine learning this process is increasingly precise, but the development of these algorithms is complicated. For this reason, this article presents the proposal developed to evaluate the quality of a person’s handwriting through image recognition in order to assist in its improvement, but performing image processing in a practical way. This evaluation will be carried out on a group of university students from the Arequipa region, in Peru, using an image processing system that allows character recognition. According to the degree of proximity, the level of handwriting will be determined in a quantified way in percentage degrees. The tests carried out show that the quality of calligraphy in university students in the Arequipa region varies between low and medium. Keywords—Calligraphy; image processing; character recognition


I. INTRODUCTION
The poor quality of calligraphy is a latent problem due to its repercussions in the field of academic learning. In recent years with the advancement of the digital age the art of handwriting has been put aside by virtual writing. The use of voice recognition has even begun to replace writing with an electronic keyboard, thus leaving conventional calligraphy to drift. In Peru, schools still provide the art of calligraphy, but this soon disappears in the student due to his constant relationship with computer keyboards.
Despite the benefits obtained with digital writing, such as the hypertextuality where the texts refer to other texts and other information in multimedia format, the interactivity that allows the texts to be pluridirectional, simplified language that are essential elements and suppose an increase in expressive potentialities [1], it is necessary for students to develop good calligraphy as it allows them to better develop their learning skills.
Due to what has been described and the great advancement of image processing techniques, such as character recognition systems, a tool was developed to perform the evaluation of the quality of a person's handwriting through image recognition. For the developed system, Script typography is considered, this takes as input a template provided to the student, evaluates the characters found in the image and compares them with the characters of an ideal or desired script. Giving as a result the percentage of similarity of the input text with the desired text.
The development of image recognition software for handwritten characters will help improve the quality of people's calligraphy, thus promoting the exploitation of the advantages it offers. The use of image recognition techniques (algorithms), which will be reviewed in the state of the art, will allow students and researchers to know the area itself. Also serving as the basis for future research projects.
The rest of the article is organized as follows: Section II presents the State of the Art of the investigation. Section III presents the Theoretical Framework necessary for the development of the proposal. In Section IV the Work Methodology is presented, followed in Section V of the Tests and Results. Finally, our Conclusions and Recommendations are presented.

II. STATE OF THE ART
In the field of research it is not a secret that there are systems that recognize alphanumeric characters. It should be noted that most only have the ability and limitation to recognize the characters written by a computer. For example, this feature can be observed in scanners. Therefore, various technologies have emerged with their respective algorithms that recognize handwritten alphanumeric characters. In the publication [2] explains the operation of the system based on taking a photo with a webcam to a completed data form manually, recognize them and save them in the database, in their corresponding boxes. Likewise, The Project [3] carries out a system based on the functioning of neural networks for the recognition of hand-drawn characters. The project is divided into 2 phases. The first is that of training using the algorithm "resilient backpropagation". For this, it is work with a training data, which is a string of drawings of handmade characters. The next phase is the testing phase. This phase seeks to know how effective the training process of the neural network system has been. For this, the system is tested by entering new information which has never been seen by the system. At the end of this phase, the degree of evaluation of the system is obtained in correctly recognizing each character entered into the system. In the publication [4] also use gradients to extract the properties of characters. A Sobel mask is used to obtain the gradients of each pixel. From these results, gradients are grouped in 12 directions. Since the character is made up of a 32x32 matrix, you would have 1024 address values to be trained by the neural network. 97% yield is obtained. It should be noted that the characters analyzed are three in Hindi, one (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, 2020 in English and one special character. In addition, the work [5] investigates the impact of detecting text using global features for the benefit of Image Based Recovery. in Content and a strategy is proposed. From the experimentation carried out, it is observed that using our strategy, greater precision (15%) is achieved in the recovery of digital images. In the thesis [6] an algorithm is implemented using using image processing techniques in order to recognize numerical characters located in a token device. These processing techniques allowed a total of 80 photographs to reach 99.5% efficiency.
In the article, [10] they use Deep Convolutional Neural Network (DCNN) based classifiers has become a triumph over the state of art machine learning techniques with ccuracy of 96.40% which outperforms some prominent techniques in existence.
In the article, [11] they created a thybrid model is created by evaluating the results of each CNN and revealing the best value. As a result of the experiment carried out on the test data set, it is observed that a performance increase of 1.1% is achieved with the created model.

III. THEORETICAL FRAMEWORK
In this section the main concepts to be used in the work methodology are detailed, these are divided into two parts: Basic Concepts in Image Recognition and Basic Concepts in Image Processing.
A. Basic Concepts in Image Recognition 1) Size or Dimension: It is the space occupied by the letters in the words, or the words in a line. It is based on the areas of the writing, on the size of the ovals of the letters. Classifying them as: small print, medium print or large print and one or more measurements can be taken [7].
2) Proportionality: This element can be considered as a subset of the measurements obtained in the "Dimension" characteristic. However, this property can be based on the proportions of the upper and lower zones of the deed with respect to the central zone [7].
3) Shape: It refers to the type of writing. There are several subdivisions that are grouped according to the features that the letters present, they can be curved, angular, complicated [7][8].

4) Inclination:
It is the angle at which writing in a word, line or paragraph tends. Generally 2 types are considered: on the right, on the left.
B. Basic Concepts in Image Processing 1) Capture: This first step is to obtain the image with which we will work, this must be scanned. To obtain an appropriate image, some conditions must be taken into account such as: light, resolution [7].
2) Preprocessing: The received image must be converted to a black and white image through a binarization process, which will reduce it to values of 0 and 1. Subsequently, the colors must be inverted, that is, have a completely black background and our objects to work in white [7].
3) Binarization: Converts the received image into a binary image, thus separating the background from the objects to be analyzed [9] [12] [13].
Global methods try to find a threshold which apply to the entire image among these are threshold methods. The local methods obtain the threshold for each pixel in the image using the values of its neighbors, among which the methods of this category include the method of Niblack and Saovola.
Local methods generally produce a better result by binarizing the image even in situations where the illumination in the document is variable. However the processing and memory limitations make it difficult to implement on mobile devices.

4) Feature Extraction:
It allows to know the characteristics such as size, perimeter, area, etc. As well as handwritten features such as segment orientation.

C. Correlation Coefficient
It is a measure of the linear relationship between two quantitative random variables. In a less formal way, we define it as an index that can be used to measure the degree of relationship of two variables as long as they are both quantitative.

D. Training
Preparation to perfect the development of an activity (character recognition). A relative minimum and maximum of inputs to training should be established so that it is as efficient as possible.

A. Proposal Outline
For the proposal of this work, two entities were identified: "User" who will be the person who uses the application and "System" which is the application that has been implemented. The user starts the system and accesses the "load an image" operation so that the system works with it. The application performs image processing, which consists of capturing, cropping region of interest, segmentation of objects, work material in the clipping to then apply the respective character recognition algorithms (with the appropriate prior training). Subsequently, the calligraphy quality evaluation is carried out for each letter and for all the recognized text. Finally, a plain text file with the detailed result from the system will be generated as output.The outline is shown in Fig. 1.

1)
Training: This process must be carried out before starting the system, which consists of building a database with all the characters (26-letter lowercase alphabet, the letterñ is not considered for this process) in images. And also count for each character at least 10 copies to increase the algorithm efficiency by recognizing each character and giving an evaluation of the quality of calligraphy. For the present project, more than 25 copies were taken per letter, obtaining an efficiency greater than 90% (see Fig. 2).

Fig. 2. Character Training
2) Upload digital image to the System: In this process, the user will upload a digital image to the system, which must necessarily contain a handwritten text, which will be the work of the system. • Format Content The text written in the format provided by the user must be as clear as possible, made with a black "fine pen" pen for the purpose of system efficiency.
3) Images Processing: Once the image loaded in the system, we will proceed with the image processing which consists of two parts: a) A capture of the entire image will be made at a size defined by the system in order to normalize the work object (image), b) An automatic cut will be made by the system delimiting only the region that will be the subject of work and c) It will be derived in the segmentation for each letter identified by the system within the delimited region, which also has a size defined by the system.
• Chars Segmentation: The system will perform the segmentation of the work objects (characters) based on the discontinuities of the gray level that consists of segmenting the image from the large changes in the gray levels between the pixels.

4) Recognition of Character in Image:
In this process, the system will apply the character recognition algorithm (correlation coefficient) to each object identified in the segmentation process, and then present an evaluation of the quality of calligraphy for each recognized letter.

5) Chars Calligraphy Assessment:
In this process, the calligraphy quality classification exercise will be carried out in each recognized character, having as reference the script letter of a computer, having intervals from 0% (unrecognized letter relative to the format text) to 100% (total equality). between a recognized letter and a letter generated by a computer).
• Calligraphy Quality Rating: Different ranges are established for evaluating the quality of the user's partial (each recognized character) and total (complete phrase) calligraphy, which are the following in Table I: (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, 2020 The ranges presented have been taken from an experimentation with people in the city of Arequipa, Perú.
6) Plain Text Generation as Output: As a last process, the system will generate a plain text file which will have detailed information on the identified and recognized letters, classifying them partially and totally. Highlighting the weighted quality of the calligraphy evaluated in the image.

A. Training Results
A template (see Fig. 4) was made to write all the letters, objects of interest to the system and thus train it, these were made by anonymous people ready to collaborate with the system. This time more than 25 samples were taken per letter.

B. Test Results with Automatic Format
Once the system was trained, it could be started up. Calligraphy was evaluated anonymously and randomly to a person, in this part the input image format was used (Fig. 3).
A lowercase text string is established, which was random and must also contain all the letters for the user to write and put to evaluation, which is "david exige plazo fijo o embarque truchas a new york", it is shown in Fig. 5. The system in execution requests the entrance of image to be work matter; Given the input format, the system automatically crops over the area of interest (user writing) and begins to perform letter-by-letter evaluation. Fig. 6 shows the main system interface.  In the same way, the quality results are obtained for each letter (see Fig. 8). The most useful information is presented at the end of the details, which presents the fashion obtained from all the evaluated letters and by which the final result of the calligraphy quality will be established. In the present example (see Table II), the user got a Mode of 30.67%, that is, a bad calligraphy.

C. Results of Manual Tests
For this test (see Fig. 9), a manual cut was made on the image format delimiting the area of interest (user writing), this input being for the system and at the same time it performs the evaluation letter by letter. In these results, the same image used in the previous section is taken as input (Results of automatic tests). In the same way, the quality results are obtained for each letter (see Fig. 10). In the same way, the most useful information is presented at the end of the details, which presents the fashion obtained. In the present example, the user had a fashion of 40.67%, that is, a regular calligraphy.
System tests have been carried out on a certain number of people, with the following table describing the results obtained, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 8, 2020 this in order to establish ranges for the calligraphy quality classification.
VI. CONCLUSIONS An image recognition system that can assess the quality of a person's calligraphy has been proposed and presented. The system establishes the calligraphy quality classification arbitrarily with the aim of monitoring it and encouraging the improvement of calligraphy by end users.
It has been detected in the training part that, when the system has too many input samples, an "overtraining" originates, which leads the system to lower its performance considerably.
Based on the tests carried out, it is collected that, in an automatic test, lower results are obtained compared to an automatic test, this due to the automated trimming of the area of interest in the system, seen in the results section.
Today, in the field of character recognition in images we are very close to 100% efficiency In summary, a working methodology has been developed to evaluate the quality of a person's calligraphy with good results, based on real the experimentation.

VII. RECOMMENDATIONS
Due to the lower performance in an automatic test compared to a manual test, it is recommended to implement an algorithm that does the crop in an ideal way in order not to reduce the image quality and thus obtain the expected results.
It is recommended to carry out a deep monitoring in the training part in order to establish an optimal range of inputs for it.
If you want to evaluate many characteristics of a handwritten text, it is recommended to use maching learning or other algorithms, which are more difficult to elaborate but are more efficient.
Based on the results obtained in the tests carried out in this project, it is proposed to carry out as future work a new methodology for the improvement of calligraphy of students in the Arequipa region.
The developed proposal can be taken as the basis for future projects in which the typography options in the input text are increased.