Automatic Recognition of Human Parasite Cysts on Microscopic Stools Images using Principal Component Analysis and Probabilistic Neural Network

—Parasites live in a host and get its food from or at the expensive of that host. Cysts represent a form of resistance and spread of parasites. The manual diagnosis of microscopic stools images is time-consuming and depends on the human expert. In this paper, we propose an automatic recognition system that can be used to identify various intestinal parasite cysts from their microscopic digital images. We employ image pixel feature to train the probabilistic neural networks (PNN). Probabilistic neural networks are suitable for classification problems. The main novelty is the use of features vectors extracted directly from the image pixel. For this goal, microscopic images are previously segmented to separate the parasite image from the background. The extracted parasite is then resized to 12x12 image features vector. For dimensionality reduction, the principal component analysis basis projection has been used. 12x12 extracted features were orthogonalized into two principal components variables that consist the input vector of the PNN. The PNN is trained using 540 microscopic images of the parasite. The proposed approach was tested successfully on 540 samples of protozoan cysts obtained from 9 kinds of intestinal parasites.


INTRODUCTION
The intestinal parasite is a form of human parasite.It is one cause of medical consultations in tropical countries, especially in underdeveloped countries.It was estimated about to 4 billion the number of people infected worldwide [1].This pathology causes death or physical and mental disorders in children and immune-deficient individuals [1].The diagnosis of parasitical diseases is performed in the laboratory by the visualization of stools samples through the optical microscopy.Intestinal parasites are classified taxonomically into protozoa and helminths.The protozoa can be seen in stools either on the vegetative form or as the resistant cyst.Helminths are found in the stool in the statement of eggs or larvae.The identification of a parasite is done by the comparison of the morphology observed with the known shapes.This practice is very tedious and is not without consequences for the eyes of laboratory technicians.Also, it is time-consuming and is subject to many diagnosis errors.The identification of amoebic cysts remains the most difficult.Indeed, the cysts are smaller than the helminth eggs, and their distinguishing criteria are more complex.Unlike helminths for which the size is a determinant parameter distinction, many amoeba cysts have almost the same dimensions, and we must use other types of parameters such as the number of the nucleus, for example, to identify them.
During the last decades, several studies employed microscopic image analysis to automatically diagnose the human parasites [2; 3; 4].Since many parasitic organisms present developmental stages that have a well-defined and reasonably homogeneous morphology, they are amenable to pattern recognition techniques.Each study can be distinguishing from other by the species of parasite concern in the classification, the classification tools and the type of feature using by the classifier.Yang et al. [2] addressed the identification of human helminth eggs by artificial neural network (ANN).Avci et al. [3] and Dogantekin et al. [4] addressed the recognition of human helminth eggs using support vector machines (SVMs) and a fuzzy inference system based on adaptive network, respectively.While these methods are limited to helminths, Castanon et al. [5] used Bayesian classification for the identification of seven species of Eimeria (a protozoan of the domestic fowl); Ginoris et al. [6; 7] used ANN to recognize protozoa and metazoa that are typically www.ijarai.thesai.orgfound in sludge; and Widmer et al. [8] addressed the identification of Cryptosporidium oocysts and Giardia cysts in water using ANN and immunofluorescence microscopy.These works, however, do not address the identification of human intestinal protozoa in feces, and the segmentation of the parasites is manual.Recently, Suzuki et al. [9] proposed a first solution for automatic identification of the 15 most common species of protozoa and helminths in Brazil.
In this paper, we propose an automated method of human parasite diagnosis via image analysis and an artificial neural network system.Our approach relies on three main steps after image acquisition: edge detection, image segmentation and object recognition.In [10] and [11], we proposed a solution for the two first steps.The present work focuses on the parasite recognition using the results of the precedent works.The main difference of our method from the previously parasite recognition method is the type of feature descriptor used in this work.Our feature descriptor uses the image pixel directly and not need to process other parameters from this image.Our classification tool combines principal component analysis (PCA) for dimensionality reduction and Probabilistic neural network.Our method was applied successfully on nine amoebic cysts types.
The rest of the paper is organized as follow.In section 2, we describe the principles of the methodology.The algorithms of parasite recognition are described in section 3. Experimental results are presented and discussed in section 4. Finally, section 5 presents the conclusions of the work.

A. Edge detection, image segmentation and parasite extraction
Parasite extraction is a crucial step preceding the recognition.An image may contain multiple parasites and we need to identify them individually.Object extraction consists to separate the image region of interest (ROI) from it background.This is done after the segmentation process.The common methods of segmentation can be divided into two categories: the method based on the region and method based on contours.The region oriented segmentation is based on the intrinsic properties of objects to be extracted.This method is highly dependent on the characteristics of the image and shape to extract.The method using contours consists to seek the boundaries of the region to be extracted by exploiting the discontinuity of intensity levels.The usual techniques of edge detection are based on either the gradient or the Laplacian of the image intensity function.We distinguish the following detectors: Sobel, Robert, Prewitt, and Canny.It is shown in [10] that the edge detection based on multi-scale wavelet is better than the other classic detectors when applying on the microscopy image of intestinal parasite.The other advantage for applying the wavelet transform to the detection of edges in an image is the possibility of choosing the size of the details that will be detected [10].The Hough transform has the ability to extract the parametric forms in an image.The Hough Transform has been applied to a wide variety of problems in machine vision, including: line detection, circle detection, detection of general outlines, the detection of surfaces and the estimation of 2 and 3-D motion.The Hough transform (HT) has been used in [12] to segment ultrasound images of longitudinal and transverse sections of the carotid artery.Certain intestinal parasites are circular in its shape.The circular Hough transform can be easily used to locate and extract them.However the location of other random forms remains partial.Another segmentation method uses the active contours model.The active contour method is very effective for the detection of boundaries.An example of the active contours model implementation is presented in [13].The strong dependence on initial contour has long been considered as its main drawback.An initial contour close to the target contour promotes greater convergence.It is possible to combine the technique of Hough transform to the active contour technique.The Hough transform allows to this new method to automatically locate the region of interest of the parasite.This first result will be the initial contour for active contours model.This approach had been used successfully in [11] to extract intestinal parasites on the microscopy images.
The first step is the edge detection obtained from the multiscale wavelet transform.The second step process the circular Hough transform from the edge.In this step, each edge element votes for all the circles that it could lie on and the circle corresponding to the maximum vote is retained.This circle locates the region of interest around of which the parasite is situated.In the third step, the active contour based on gradient vector flow model is computed.The active contour uses the circle obtained from the Hough transform as its initial contour.This initial contour is deformed and attracted towards the target contour by various forces that control the shape and location of the snake within the image.The last step uses the final contour to get the mask corresponding to the interior of the area delimited by the contour.The parasite extracted corresponds to a logic operation of the mask with the original image.Besides the microscopic image, our scheme also use as input the length of the radius to find the parasite, the number of convolutions (analyzing scale) to find the edge map by the multi-scale wavelet transform and the threshold of the edge detection.The detailed version of the parasite extraction algorithm can be seen in [11].

B. Feature extraction and dimensionality reduction
Feature extraction aims to represent candidate objects with a simple and representative manner to discriminate an object from other.There is several type of feature descriptor [14].The main difference of our method in relation to the previously parasite recognition method is the type of feature descriptor used in this work.Our feature descriptor uses the image pixel directly and not need to process other parameters from this image.The dimension of such vectors is very large.There is the need to map those vectors to the lower dimensional space; thereby reducing the computational complexity for further classification.The feature dimension reduction consists of mapping the input vectors of observations The set of vectors The vectors of X T are obtained by the linear orthonormal projection obtained by inverting (3).The mean square reconstruction error  PCA is treated throughout for instance in [17].

C. Object classification using the Artificial Neural Network
Artificial Neural Networks are analogous to their namesake, biological neural networks, in that both receive multiple inputs and respond with a single output.ANN classifies input vector into a specific class according to the maximum probability to be correct.These networks have diverse applications in machine vision [18].One of their applications is in classification and decision making based on existing data [19].For this goal, the input data is often divided into two parts for training and testing.
In an ANN, multiple neurons are interconnected to form a network and facilitate distributed computing.Each neuron partition constitutes a layer.Networks may contain an input layer, an output layer, and a hidden inner layer.Additional hidden layers may be added to increase the complexity of the network.Weights are assigned to each of the links between neurons, and they are updated as part of the learning process.
The configuration of the interconnections can be described efficiently with a directed graph.A directed graph consists of nodes and directed arcs.The topology of the graph can be categorized as either acyclic or cyclic [20; 21].A neural network with acyclic topology consists of no feedback loops.Such an acyclic neural network is often used to approximate a nonlinear mapping between its inputs and outputs.A neural network with cyclic topology contains at least one cycle formed by directed arcs.Such a neural network is also known as a recurrent network.Due to the feedback loop, a recurrent network leads to a nonlinear dynamic system model that contains internal memory.Recurrent neural networks often exhibit complex behaviors and remain an active research topic in the field of artificial neural networks.Network topology architecture was systematically chosen in terms of variance of classification results and its complexity.The Probabilistic Neural Networks (PNN) is used in this paper to classify the type of intestinal parasite.This choice is adopted for its advantages [22].The main advantage of PNN is that training is easy and instantaneous [23; 24; 25].The architecture for this system is shown in Fig. 1.We adopt the symbols and notations used in the book -neural network toolbox for use in Matlab‖ [26].www.ijarai.thesai.orgThe input layer has two units, corresponding to the two features.The input vector, denoted as P, is presented as the black vertical bar in Fig. 1.Its number of element R corresponds to the number of neurons in the Radial Basis Layer.
In Radial Basis Layer, the vector distances between input vector P and the weight vector W is calculated.This operation is represented in the Fig. 1  The competition layer classifies each input in each of the K class of protozoan cysts use during the training phase.There is no bias in the competitive layer.In competitive layer, the vector a is firstly multiplied with layer weight matrix LW, producing an output vector m.The competitive function, denoted as C in Fig. 1, produces a 1 corresponding to the largest element of m, and O's elsewhere.The output vector of competitive function is denoted as S. The index of K in the output is the number of parasite that our system can classify.It can be use as the index to look for scientific name of the parasite.The dimension of output vector is K=9 in this paper.Our feature descriptor uses the image pixel and does not need to process other parameters from this image.Nevertheless, the image needs to be resized.All image dimension of the extracted parasite has been reduced to the size 12x12.The resizing uses the Bicubic interpolation method.With this method, the output pixel value is a weighted average of pixels in the nearest 4-by-4 neighborhood.

A. Step1: image edge detection, segmentation and parasite extraction
In order to reduce the complexity of the classification and recognition phase, the feature using as input need to be reduced.PCA is used to project the 12x12 features on the new feature space.The goal of PCA is to reduce the dimensionality of the data space (12x12) to smaller intrinsic dimensionality of feature space, which is needed to describe the data economically.For this purpose, the dimensionality of the new feature space is chosen according to the classification rate and the system complexity.In this paper, the first 2 principal components have been used.When using our algorithm, one can use the mapping

C. Step 3: parasite classification and recognition
In this phase, the reduced features are applied to train and test the neural network.Totally 1080 microscopic images of protozoan cysts are used.These samples were divided in 540, with 60 of each type of 9 parasites for training, and 540 for the test.All these parasite cysts are rotated in steps of 30° from 0°-150, five different scales were chosen.Also, these images were bruited with 4 types of image noise ('Gaussian', 'Poisson', 'salt & pepper', 'speckle').In this way, 120 microscopic images were obtained for each parasite cysts types.For training the classifier, randomly chosen half of the database was used.The spread parameter was chosen to 0.4, for the best performance, after several different experiments.
The rest of the database was used in testing stage.For each kind of parasite, 60 pieces of cyst are used to test the accuracy of our algorithm.The confusion matrix of the proposed expert diagnosis system is shown in table 1.As shown in table 1, all the nine parasite cysts type images were classify with 100 % correct classification rate.This demonstrates the effectiveness of the proposed feature descriptor in the protozoan cyst recognition system based on the principal component analysis and the probabilistic neural network.This paper presented an image analysis system to recognize nine kinds of human parasites.Artificial Neural Network can classify automatically the intestinal parasite via the microscopic images of stools loaded from digital cameras or scanner.The probabilistic neural network is adopted for hit has fast speed on training and simple structure.Contrary to previous work, 12x12 features vectors were obtained directly from image pixel.This features that form the input vector of PNN have been projected in a Principal Component Analysis basis for dimensionality reduction.
Results indicate that the image pixel features of microscopic image of human parasites are achievable, and it offers remarkable accuracy when using in a probabilistic neural network classifier with dimensionality reduction of the features by projection on the PCA basis.Further works will be focused on the recognition of other types of human parasites.
with nm  .The dimensionality reduction based on the projection technique is considered on this paper.The projection technique is achieved by using transformation matrix M. The Principal Component Analysis (PCA) is a representative of the unsupervised learning method which yields the linear projection[15; 16].www.ijarai.thesai.orgLet consider an input vector m I  which need to be mapping onto a new feature description of training vector from the m-dimensional input space m .
where the matrix M and the vector k are parameters of the projection.The reconstruction vector is a function of the parameters of the linear projections (3) and (4).The principal component analysis is the linear orthonormal projection (3) which allows for the minimal mean square reconstruction error (5) of the training data I T .The parameters   , Mk of the linear projection are the solution of the optimization task eigenvectors of the sample covariance matrix which have the largest eigen values.The vector k equals to T M  , where  is the sample mean of the training data.The

Fig. 1 .
Fig. 1.Neural Network architecture Our PNN has three layers: the Input layer, Radial Basis Layer and the Competitive Layer.
by the dist box and its output gives WP  .The bias vector b and the output WP  are combined by an element-by-element multiplication, represented as ".*" in Fig.1.The result is denoted as .*n W P b  .The transfer function in the radial basis network is define as 2 ( ) exp( ) radbas n n  (7) Each element of n is substituted into (7) and produces corresponding element of a, the output vector of Radial Basis Layer.We can represent the i-th element of a as ( the vector made of the i-th row of W and i b is the i-th element of bias vector b.An input vector close to a training vector is represented by a number close to 1 in the output vector a .If an input is close to several training vectors of a single class, it is represented by several elements of a that are close to 1.

Fig. 2 .
Fig. 2. The overall processing stages of proposed scheme for intestinal human parasite diagnosisThe main goal of this work is to recognize the types of intestinal parasites in the microscopic image of stools.To achieve this purpose, a block diagram is designed based on this type of dataset.Specially, we focus on the classification and recognition of nine types of protozoan cysts.The block diagram of the proposed scheme is illustrated in the Fig.2.As

Fig. 3 .Fig. 4 .
Fig. 3. Representative illustration of a microscopic image preprocessing.(a) Microscopic image of stools, (b) image of edge detected, (c) image of the extracted parasites Since in an image, we can have several objects which cannot be necessary interest us.Also, image can contain different parasites.We need to extract individually a parasite before the recognition.The parasite extraction is doing via the image segmentation.The segmentation use the contour edge map of the image to separate the parasite from is background.A logic operation is used to suppress the parasite background.Our segmentation method is based on the active contour initialized by the Hough transform.The edge detection method the values of components in the new coordinate system.The Fig.4shows the result obtained in the two dimensionality projection with two Eigen vectors corresponding to the largest two Eigen values in PCA basis.This transformation is applied on 20 trained images of each of the nine types of parasite.In this figure, we can distinct the nine types of parasite.Also, as shown in this figure, each type of parasite is grouped separately from other.The capability of the new features in separating of the 9 classes of parasites can be qualitatively evaluated.

TABLE I .
CONFUSION MATRIX OF THE CLASSIFICATION SCHEME