Towards Secure IoT Communication with Smart Contracts in a Blockchain Infrastructure

The Internet of Things (IoT) is undergoing rapid growth in the IT industry, but, it continues to be associated with several security and privacy concerns as a result of its massive scale, decentralised topology, and resource-constrained devices. Blockchain (BC), a distributed ledger technology used in cryptocurrency has attracted significant attention in the realm of IoT security and privacy. However, adopting BC to IoT is not straightforward in most cases, due to overheads and delays caused by BC operations. In this paper, we apply a BC technology known as Hyperledgder Fabric, to an IoT network. This technology introduces an execute-order technique for transactions that separates the transaction execution from consensus, resulting in increased efficiency. We demonstrate that our proposed IoT-BC architecture is sufficiently secure with regard to fundamental security goals i.e., confidentiality, integrity, and availability. Finally, the simulation results are highlighted that shows the performance overheads associated with our approach are as minimal as those associated with the Hyperledger Fabric framework and negligible in terms of security and privacy.

sectionIntroduction In this modern technological and digital age, optical character recognition (OCR) systems play a vital role in machine learning and automatic recognition problems. OCR is a section of software tool that converts printed text and images to machine readable form and enables the machine to recognize images or text like humans. OCR systems are commercially available for isolated languages, which include Chinese, English, Japanese, and others. However, few OCR systems are available for cursive languages such as Persian and Arabic and are not highly robust. To the best of our knowledge, there is no such commercial OCR system available for carved Pashto letters recognition; however, such systems exist in research labs.
Handwritten letters recognition is a daunting task mainly because of variations in writing styles of different users. Handwritten letters recognition can be done either offline or online. Online character recognition is simpler and easier to implement due to the temporal based information such as velocity, time, number of strokes, and direction for writing. In addition, the trace of the pen is a few pixels wide so this does not require thinning techniques for classification. On the other hand, offline character recognition system implementation is even laborious due to high variations in writing and font styles of every user. In our paper, we present inscribed of handwritten Pashto letters.
Pashto is a major language of Pashtun tribe in Pakistan and the official language of Afghanistan. In censes 2007 2009, it was estimated that about 40 60 millions of people around the world are native speakers of this language. Pashto letters can be shaped into six different formats, which make the recognition process challenging. Furthermore, the count of character dots and occurrence of these dots that varies from letter to letter make the problem challenging. In order to address these problems, research shows the use of high level features based on the structural information of letters. An OCR based system using deep learning network model that incorporates Bi-and Multi-dimensional long short term memory for printed Pashto text recognition has been suggested [1].
A web-based survey shows that Pashto script contains a huge number of unique ligature [2]. Such ligature makes the implementation of OCR system for carved Pashto challenging. As printed letters contain a constant shape/style and font size; thus, the said technique fails in our case due to higher higher variations in style and font in case of inscribed letters. Riaz et al. [3] has presented the development of an OCR system for cursive Pashto script using scale invariant feature transform and principle component analysis. In order to address this issue, we present a system for handwritten Pashto letters recognition, which has the following key contributions: • As there is no standard handwritten Pashto letters database for testing an algorithm; thus, one of the contribution of this work is to develop and present a medium-sized database of 4488 (102 samples for each letter) for further research work.
• The second contribution of this research work is to provide a base result as a benchmark for Pashto language. For this purpose, the performance results of the state-of-the-art classifiers−KNN and deep Neural Network are used based on zoning features.
• Our proposed handwritten Pashto letters recognition system is efficient, simple, and cost-effective.
tion about the classifiers and feature extraction algorithm used in this research work. Section III delineates the methodology. Section IV discusses about the feature extraction, which is very important in the area of pattern recognition and machine learning while section V demonstrates the experimental results followed by the conclusions and future work in Section VI.
I. RELATED WORK Pashto, Persian, Urdu, and Arabic are sister languages. Several diverse approaches are suggested by different researchers for developing an OCR system for these languages. However, Pashto script contains more letters (44) than Arabic script (28 letters), Persian script (32 letters), and Urdu script (38 letters). Pashto language encapsulates all the letters from Urdu script with additional seven letters. This additional seven letters make the OCRs developed for Persian, Urdu, and Arabic language unable to recognize handwritten Pashto letters. As per our best knowledge, some of the closely related work on the prescribed languages is mentioned below.
Abdullah et al. [4] presented an OCR system for Arabic handwriting recognition based on Neural Network classifier for classifying an IFN-ENIT dataset. Ahmad et al. [5] presented a novel approach of gated bidirectional long short term memory (GBLSTM) for recognition of printed Urdu Nastaliq text, which is a special form of Neural Network based on ligature information of the printed text. Ahmed et al. [6] used a one dimensional BLSTM for handwritten Urdu letter recognition where a medium size database for handwritten Urdu letters collected from 500 people was developed.
Alotaibi et al. [7] suggested an algorithm to develop an OCR that can check the originality and similarity of online Quranic contents where Quranic text is a combination of diacritics and letters. For diacritic detection, they used regionbased algorithms and projection method is used for letter detection. The results of the similarity indices are compared with standard Mushaf Al Madina benchmark. Boufenar et al. [8] presented the concept of supervised learning technique named Artificial immune system based on zoning technique for isolated carved Arabic letters recognition. Jameel and Kumar [9] suggested the use of B spline curves as a feature extractor for offline Urdu character recognition. Naz et al. [10] [11] presented the use of multi-dimensional recurrent Neural Network based on statistical features for Urdu Nastaliq text recognition. Rabi et al. [12] performed a survey on different OCR systems for handwritten cursive Arabic and Latin script recognition where it was concluded that the results of contextual sub character of Hidden Markov Models were proven with high accuracy for handwritten Latin and Arabic script recognition.
Rouini et al. [13] presented the use of dynamic random forest classifier based on surf descriptor feature extraction technique. Sahlol et al. [14] inspected different classifiers Genetic algorithm (GA), Particl Swam optimization (PSO), Grey Wolf optimization (GWO), and BAT algorithms (BAT) for handwritten Arabic characters recognition. After testing each algorithm, it was concluded that GWO provides prominent results for handwritten Arabic characters recognition. As Sindhi language is a super set of Arabic language, Shaikh et al. [15] developed an OCR system for text recognition using an approach based on segmentation.
M. Kumar et al. [16] presented a comprehensive survey of Indic and non-Indic scripts on letters and numeral recognition. Zayene et al. [17] presented a novel approach for Arabic video text recognition using recurrent Neural network. This system suggests a segmentation free method mainly based on a multidimensional version of long short term memory combined with a connectionist temporal classification layer. Veershetty et al. [18] suggested the concept of an optical character recognition (OCR) system for handwritten script recognition based on KNN, SVM, and linear discriminant analysis (LDA) classifiers. For feature extraction, they used a technique based on Radon and wavelet transform, and words were extracted using morphological dilation methods.
Malviya et al. [19] carried out a comparative study of various feature extractions techniques named Zernike moments, projection histogram, zoning methods, template machine, and chain coding technique and classification algorithms such as SVM and Artificial Neural Network (ANN) have been discussed. Some vital parameters are selected based on sample size, data types, and accuracy. Bhunia et al. [20] presented a novel approach for word level Indic-script recognition using character level data in input stage. This approach uses a multimodal Neural Network that accepts both offline and online data as an input to explore the information of both online and offline modality for text/script recognition. This multi-modal fusion scheme combines the data of both offline and online data, which indeed a real scenario of data being fed to the network. The validity of this system was tested for English and six Indian scripts. Obaidullah et al. [21] carried out a comprehensive survey for the development of an OCR system for Indic script recognition in multi-script document images. Multiple pre-processing techniques, feature extraction techniques, and classifiers used in script recognition were discussed.
The literature review shows that a little work is available on the development of an OCR system for the recognition of printed Pashto letters; however, there is no OCR system developed for automatic recognition of handwritten Pashto letters. All the above mentioned algorithms perform well for the specified languages but fail in recognizing the handwritten Pashto letters owing to the extra number of letters in the character set. In this paper, we present a robust OCR system for the recognition of handwritten Pashto letters having the key benefits mentioned above.

II. BACKGROUND STUDY
This part of the paper describes the background detail of the character modeling for Pashto script, classification techniques followed by KNN, and Neural Network classifiers.

A. Pashto
Pashto is the language of Pashtuns, often pronounced as Pakhto/Pukhto/Pushto and is the official language of Afghanistan and a major language of Pashtun clan in Pakistan. In Persian literature, it is known as Afghani while in Urdu or Hindi literature, it is known as Pathani. Pashto has two major dialects namely soft dialect and hard dialect. Both of these dialects are phonologically differ from each other. The soft dialect is called southern while the hard dialect is known as www.ijacsa.thesai.org northern. In soft dialect i-e., southern, Pushto is spelled as Pashto while in hard dialect i-e., in northern, it is spelled as Pukhto or Pakhto. The word Pashto is followed as a representation for both hard and soft dialects. The Kandahari form of Pashto dialect, also known as Pata Khazana, is considered as standard spelling system for Pashto script.
Pashto script consists of 44 letters shown in Fig. 1. The Name represents letter name while Alphabet represents letters shape in isolated form. It has borrowed all the letters from Persian script, i.e., 32 letters that has further borrowed the entire letter set, i.e., 28 letters from Arabic script. That is why Pashto is known as a modified pattern of Perso-Arabic characters. Urdu script adopts all 32 letters from Persian script with 6 additional letters. Pashto script encapsulates all the Urdu characters with minor change in these 6 special characters for Urdu script as shown in Table I. It encompasses additional 7 characters, especially to Pashto script forming a dataset of 44 characters as shown in Table II. In order to make a word in Pashto script, two or more than two isolated letters are combined to form a word. While defining a word, a letter shape changes w.r.t its position (start, middle or end) in the word as shown in the Table III. Both Naksh and Nastaliq is followed for Pashto script writing; however, Naksh is considered as standard writing style for Pashto script.

B. K-Nearest Neighbor (KNN)
KNN is a supervised learning tool used in regression and classification problems. In training phase, KNN uses multidimensional feature vector space that assigns a class label to each training sample. Many researchers have suggested the use of KNN classifier in text/digits recognition and classification such as Hazra et al [22] who presented the concept of KNN classifier for both handwritten and printed letters recognition   in English language based on sophisticated feature extractor technique.
For online handwritten, Gujarati character recognition Naik et al. [23] suggested the use of SVM with polynomial, linear, and RBF kernel, KNN with variant values of K and multi-layer perception (MLPs) for stroke classification based on hybrid feature set. Selamat et al. [24] suggested the use of hybrid KNN algorithms for web paged base Arabic language identification and classification. They carried out the results based on SVM, back propagation neural network, KNN, and hybrid KNN. Zhang et al. [25] presented the use of KNN for visual category recognition based on text, color, and particularly shape in a homogeneous framework. Hasan [26] presented the concept of KNN classifier for Arabic(Indian) digits recognition using multi-dimensional features, which consist of discrete cosine transform (DCT) and projection methods. KNN generates classification results by storing all the available cases and stratify new classes based on a similarity measure (distance functions). Pashto contains 44 letters in its character set so there are 44 classes to be classified. In short, it is a multi-class recognition problem. Fig. 2 represents a basic multi-class KNN model. In Fig. 2 class1, class2, and class3 represent 3 different classes. In our case, it contains 44 classes as there are 44 letters in Pashto character dataset.

C. Neural Network (NN)
NN has performed a vital role in the recognition and classification problems. Inspired from human nervous system, ANN is composed of layered architecture-input, hidden, and output layer. It contains a network of neurons connected through weighted connections that accepts input, performs processing, and produce detailed patterns. Machine learning www.ijacsa.thesai.org  (ML) has been widely used in a varitey of applications. ML has been used in scheduling tasks in real time through cloud computing in the form of genetic algorithms [27]. Another study shows the use of ML models in genomics [28]. The goal is to detect variations and errors in Genomics datasets that entail higher variations. Decision trees and tabu search have been utilized in order to learn the dispatching rules for smart scheduling [29]. To explore the active learning, exponential gradient exploration has been studied [30].
Owing to NN's high identification and recognition abilities especially in text recognition problems, multiple researchers have suggested the use of this model, some of which are mentioned here. Jameel et al. [31] carried a review paper on Urdu character recognition using NN. In this paper, they suggested the use of B-Spline curves as a feature extractor technique for Urdu characters recognition. Zhang et al. [32] presented the use of recurrent NN for drawing and recognition purposes of Chinese language. Patel et al. [33] suggested the use of ANN for handwritten character recognition based on discrete wavelet transform as a feature extractor technique, which is based on accurate level of multi-resolution technique. A basis NN diagram for HPLR system is shown in Fig. 3. In this research work, a NN classifier is selected with two hidden layers and one input and output layer. A feature map of 16 distinct values based on zoning technique are fed at input layer and the expected results are calculated at the output layer.

III. THE PROPOSED METHODOLOGY
The proposed OCR system for the recognition of handwritten Pashto letters is divided into three main steps as shown in Fig. 4.
• Database development for the handwritten Pashto letters.
• Classification and recognition using KNN and NN classifiers.

A. Database development for the handwritten Pashto letters
A medium size handwritten character database of 4488 characters (contains 102 samples for each letter) is developed by collecting handwritten samples from different individuals. These samples are collected on an A4 size paper divided into 6 columns for collecting a letter variant samples from same person. These samples are further scanned into computer readable format as shown in as shown in Fig. 5 and Fig. 6.   Each extracted letter in Table IV is hugely affected with dark spots i.e., noise, which is removed using thresholding. During the data collection phase, the handwritten character position varies in the 64×64 region/box. The reason is that letters can be written on top, left, right, and bottom of the box varying from person to person. We have centralized all the letters. Post-thresholding and centralizing results are captured in Table V.

IV. FEATURE EXTRACTION
Selecting an astute, informative and independent feature is a crucial step for effective classification. This paper presents the concept of zoning method as a feature extractor technique for the recognition of handwritten Pashto letters.

A. Zoning Technique
This research work uses a 4×4 static grid to extract each letter features as shown in Fig. 7. By applying this zoning  grid, it superimposes the pattern/character image and divides it into 16 equal zones. In each zone, the density of the letter is extracted that represents the ratio of the black pixels forming the letter on the total size of zone [34]. In this way, a feature map for all 4488 letters is obtained for classification. After applying this technique a feature vector of 16 real values formed for each sample because we focuses on zones not on the number of pixels.

V. RESULTS
This section summarizes the results obtained after applying KNN and NN classifiers to handwritten Pashto letters for classification/recognition.

A. Classification Accuracy of K -Nearest Neighbours
The results of the KNN classifier for Pashto script recognition are shown in Fig. 8. The results are carried out using KNN classifier based on zoning features. The total image features for the Pashto letters is divided into a ratio of (2:1) for training and testing phases. The databases consists 102 samples for each Pashto letter. Thus, 68 letters features are selected for training phase and the remaining 34 letters features are selected for testing phase. An overall accuracy of about 70.05% is obtained for KNN, lesser than ANN, which is 72%.
The accuracy of the KNN classifier is tested for different nearest neighbor values of K and it was detected that accuracy www.ijacsa.thesai.org   varies when the value of K increases, as the occurrence of other class features causes miss-classification. Fig. 9 represents the accuracy results drawn for varying values of K. High accuracy of KNN classifier is noted for the value of K equals to 1, because of high values of K causes the occurrence of other class features that cause miss-classification.

B. Classification Accuracy of Neural Network Classifier
The feature map is divided into 2:1 for training and test data. NN classifier achieves an accuracy of about 72 % better than KNN classifier. Fig. 10 represents the overall result of NN classifier for Pashto letter recognition problem.
The efficiency of the classifier is tested for different size of training and test samples vs. time. The data is split into (training, test) sets of of (35%, 65%), (40%, 60%), (45%, 55%), (50%, 50%), (55%, 45%), (60%, 40%), (65%, 35%), (70%, 30%), (75%,25%), and (80%, 20%). The corresponding time and accuracy results are generated in Fig. 10. Where it is explained that when there is an increase in the training size, accuracy of the system increases. However, increasing the training size adversely affects the simulation time. A higher accuracy rate of 72% is carried out for 80% of training and 20% of test set. Furthermore, the NN results based on varying epoch size for different training and test sets are also shown in Fig. 11. It is evident that as the number of epoch increases for given training and test sets, accuracy of the system increases. The mean square error error rate and gradient in the shape for handwritten Pashto letters is shown in Fig. 12.

VI. CONCLUSIONS AND FUTURE WORK
In this paper, an OCR system for automatic recognition of Pashto letters is developed by using KNN and NN classifiers based on zoning feature extractor technique. Experimental results show an accuracy of 70.07% for KNN while 72% for NN. Contributions include the provision of handwritten Pashto letters database as a resource for future research work and the experimental results, which will provide a baseline accuracy for future models tested on the data.
In future, we aim to extend and evaluate our technique for a larger database of Pashto script using an increasing number of hidden layers coupled with different feature extractor techniques to achieve a higher accuracy. Furthermore, our goal is to extend the proposed model for the connected letters.