Automated Periodontal Diseases Classification System

— This paper presents an efficient and innovative system for automated classification of periodontal diseases, The strength of our technique lies in the fact that it incorporates knowledge from the patients' clinical data, along with the features automatically extracted from the Haematoxylin and Eosin (H&E) stained microscopic images. Our system uses image processing techniques based on color deconvolution, morphological operations, and watershed transforms for epithelium & connective tissue segmentation, nuclear segmentation, and extraction of the microscopic immunohistochemical features for the nuclei, dilated blood vessels & collagen fibers. Also, Feedforward Backpropagation Artificial Neural Networks are used for the classification process. We report 100% classification accuracy in correctly identifying the different periodontal diseases observed in our 30 samples dataset. I. INTRODUCTION Periodontitis is a chronic inflammatory disease of vascularized supporting tissues of the teeth [1], Periodontal disease occurs when inflammation or infection affect the gingiva and extend to the periodontal apparatus [2], The 1999 classification system for periodontal diseases and conditions listed seven major categories of periodontal diseases [3,4]: Gingivitis, Chronic periodontitis, Aggressive periodontitis, Periodontitis as a manifestation of systemic disease, Necrotizing ulcerative gingivitis/periodontitis, Abscesses of the periodontium, Combined periodontic-endodontic lesions; The latter 4 are associated with systemic diseases [5,6,7,8,9]; Hence, this work preliminary focuses on identifying the different types of periodontal diseases by using computer-assisted microscopy system for automated classification of periodontal diseases to increase the accuracy and reduce the workload in classifying and diagnosis of the different categories of periodontal diseases, which help in designing the treatment plan used.


INTRODUCTION
Periodontitis is a chronic inflammatory disease of vascularized supporting tissues of the teeth [1], Periodontal disease occurs when inflammation or infection affect the gingiva and extend to the periodontal apparatus [2], The 1999 classification system for periodontal diseases and conditions listed seven major categories of periodontal diseases [3,4]: Gingivitis, Chronic periodontitis, Aggressive periodontitis, Periodontitis as a manifestation of systemic disease, Necrotizing ulcerative gingivitis/periodontitis, Abscesses of the periodontium, Combined periodontic-endodontic lesions; The latter 4 are associated with systemic diseases [5,6,7,8,9]; Hence, this work preliminary focuses on identifying the different types of periodontal diseases by using computerassisted microscopy system for automated classification of periodontal diseases to increase the accuracy and reduce the workload in classifying and diagnosis of the different categories of periodontal diseases, which help in designing the treatment plan used.
The paper is organized as follows: Section II explains the materials and methods that we followed to get our dataset, Section III describes the implementation details of our proposed system, Section IV shows experimental results and discussions about these results, and Section V concludes the paper and introduces the future work that can be done.

A. Study Cases
The study was conducted on 32 patients attending between February 2009 and March 2011 to the Oral Medicine, Periodontology, Oral Diagnosis and Radiology Department, Faculty of Dental Medicine Girls' Al-Azhar University.
Patients were suffering from gingival inflammation, which may extend to include the periodontium (different types of periodontitis) or gingival overgrowth (due to different etiological factors).
Patients were excluded from the present study if they were smokers, pregnant or post-menopausal women.All selected patients had not undergone any periodontal therapy for at least six months.

B. Collected Clinical Data
The following clinical parameters were collected and recorded on six sites at each tooth; all linear measurements were recorded to the nearest 0.5 mm using William graduated periodontal probe: Plaque index (PI) [10] Pocket Depth (PD) and Clinical attachment level (CAL) was measured from the CEJ to the apical part of the sulcus, All included patients were indicated for surgical flaps as a line of their treatment.

C. Histopathological Sample Preparation
After the surgical procedures; the excised tissue samples were immersed in 10% formalin and decalcified in multiple baths of 10% trichloroacetic acid.The blocks were immersed in paraffin, and semi-serial 4 µm histologic sections were stained with Haematoxylin and Eosin (H&E).

E. Periodontal Diseases Groups
The 32 patients were classified into 4 diagnostic groups according to their periodontal status; 16 patients were classified into Gingival Enlargement group, 7 patients into Chronic Gingivitis group, another 7 patients into Chronic Periodontitis group, and the last 2 patients into Aggressive Periodontitis group.
We excluded the patients which were classified into the Aggressive Periodontitis group from our classification experiments; because of the very small number of study cases that we have for this group; which was not sufficient for our neural networks based classification system to work with.Although we included these 2 cases in our preprocessing and feature extraction experiments, to make sure that our proposed system is generic enough to handle all of the different types of diseases.
We also included another study case for a healthy person to our preprocessing and feature extraction experiments.

III. PROPOSED SYSTEM
Our proposed system can be divided into several welldefined stages as illustrated in Figure 1.We first get the H&E stained slide, do some preprocessing steps to it to remove background and undesired objects, segment it to the tissue's main components epithelium and connective tissue, then extract features from both parts, and then use these features along with the clinical data of the patient to start the classification process.

A. Pre-processing
Mainly images consist of 2 parts: Tissue (Epithelium + Connective Tissue) and Background.In this preprocessing phase, we need to remove the background areas from the image, allowing the further processing to be done only on the tissue part.
We implemented 2 different algorithms to achieve this task; the first one is fully-automated, while the second one is semi-automated because it requires some clicks from the user to specify the seed points in the background area.The pseudo codes for fully-automated and semi-automated algorithms are shown below in Pseudocode I & Pseudocode II respectively.0shows one sample result of the 2 algorithms.Also, sensitivity and specificity results of these algorithms are displayed in the Results and Discussion section.

B. Epithelium Segmentation
In this phase of our system, we need to segment the tissue into Epithelium and Connective Tissue; Epithelium segmentation was targeted in some studies [15,16,17].As stated in [15] there are a lot of challenges doing this task, and they are mainly caused by staining artifacts, lighting acquisition conditions, and undesired touching objects.Some studies [15] used the saturation channel in the HSV color space, others [16] used multiple gray level automated thresholding over the image's green channel, while others [17] used region growing algorithm [18] over the gray channel of the image.
We found that 73% of our samples give better results when using the HSV Saturation channel, while the remaining 27% give better results when using the RGB green channel.We implemented an automated way for this segmentation process; Pseudocode III describes our epithelium segmentation algorithm.
We also implemented a post-processing step, to move some parts from epithelium to connective tissue or vice versa manually, because our proposed algorithm didn't achieve highly accurate results for all study cases, because the brightness value of the infiltrate was similar to that of the epithelium in images with high inflammatory infiltrate [16].Information about sensitivity and specificity of our algorithm is found in the Results and Discussion section.Figure 3 shows one sample result of this algorithm.

C. Feature Extraction
We extracted a set of features from our Haematoxylin and Eosin (H&E) stained slides, that we thought it will be helpful in the classification process later.Below are the extracted features along with the algorithms we used to extract it:

1) Epithelium Percentage
After segmentation of the tissue image into its components (Epithelium and Connective Tissue) in the previous phase, we calculated the percentage of the Epithelium part.
(Note that the background regions are already excluded)

2) Nuclei in Epithelium
We segmented the nuclei that found in the epithelium part, and used its count and percentage from the whole epithelium in our features list, Nuclei segmentation was targeted in some studies such as [11], [21], [22] which were using nuclear localization achieved using the colour deconvolution algorithm developed by [23] to obtain the optical density of the Haematoxylin stain alone plus spatial partition of the epithelial compartment representing the exclusive area of influence of each nucleus profile, [24] which is based on the multiscale Laplacian-of-Gaussian (LoG) filter, [25] which depends on ellipse fitting and the watershed transform, [26] which relies on some morphology-based techniques.www.ijacsa.thesai.orgIn our proposed system, we used steps in Pseudocode IV to segment the nuclei in the epithelium part of the image; Figure 4 shows two sample results of this algorithm.

3) Nuclei in Connective Tissue
We segmented the nuclei that found in the connective tissue component, and used its count and percentage from the whole connective tissue in our features list.
We do this by getting 2 masks; the first one represents the light areas in the connective tissue (blood vessels) while the second one represents the dark areas (nuclei).Then we need to filter the nuclei mask to exclude the ones that are related to the dilated blood vessels.Our steps for doing this segmentation part are described in Pseudocode V. Figure 5 shows one sample result of this algorithm.

4) Dilated Blood Vessels
We segmented the dilated blood vessels that found in the connective tissue part, and used its count and percentage from the whole connective tissue in our features list.
We do this with almost the same steps described in the Pseudocode V that were used in the connective tissue nuclei segmentation.We get the first mask with exactly the same steps, while excluding steps e, f and j from the ones required to get the second mask.And to get the final mask, we filter parts in Mask 1 (dilated blood vessels candidates) to only the ones that intersect at least 3 times with parts of Mask 2 (nuclei); finally, we count and get the percentage of all objects in Mask 3 .0Figure 6 shows one sample result of this algorithm.have more than 800 pixels, Mask 5 .8. Convert each connected object to the ellipse that fit it, Mask 6 .9. Finally we count and get the percentage of all objects in Mask 6 .
a For 9 samples of our 33 samples, we used RGB Red channel instead of Haematoxylin channel, because we found it giving better results.b For 3 samples, we also apply image adjustment after step 2. 1. Get Mask 1 a.Get the "Green" channel of the connective tissue part, retrieved from the epithelium segmentation phase, Channel 1 .b. Enhance the contrast of the grayscale channel Channel 1_1 by transforming the values using contrastlimited adaptive histogram equalization (CLAHE) [28], Channel 1_1 .c. Apply 3 • 3 median filter for removing the noise [19] in

5) Collagen Fiber
We segmented the collagen fiber that found in the connective tissue part, and used its percentage from the whole connective tissue in our features list.
We do this by getting 2 masks, the first one represents the light areas in the connective tissue, and the second one www.ijacsa.thesai.orgrepresents the dark areas, then we need to filter the nuclei mask to exclude the ones that are related to the dilated blood vessels.Our steps for doing this segmentation part are described in Pseudocode VI. 0Figure 7 shows one sample result of this algorithm.

D. Classification
After extracting all the features, we used Feedforward Backpropagation Artificial Neural Networks in the classification process.We divided our samples randomly into 3 groups, 20 samples for the training set which were presented to the neural network during training, and the network was adjusted according to its error, 5 samples for the validation set which were used to measure network generalization, and to halt training when generalization stops improving, and the last 5 samples for the testing set which had no effect on training so providing an independent measure of network performance during and after training.
We wrote some algorithm to make sure that each data set contains a relative number of study cases for each disease group.We used scaled conjugate gradient backpropagation function for network training.Also, we used mean squared error as our performance function.IV.RESULTS AND DISCUSSION For background removal preprocessing phase, we applied both fully-automated and semi-automated preprocessing algorithms described in Pseudocode I and Pseudocode II respectively.We considered the semi-automated results as the ground truth, and compared the results of the fully-automated algorithm to it; over our 33 samples, we achieved a sensitivity of 68.42% and specificity of 98.56%.6 samples of our samples actually contain no background at all, and when removing their results from our statistics, we achieved a sensitivity of 83.33% and specificity of 98.66% over our remaining 27 samples.www.ijacsa.thesai.orgFor epithelium segmentation phase, we applied the algorithm described in Pseudocode III over our date, and since the results of this step has to be accurate since it is used in all later phases, we also implemented a manual way for moving parts from the epithelium part to the connective tissue part or vice versa.We considered the results after manual processing the ground truth, and found that our algorithm achieved a sensitivity of 84.99% and specificity of 88.40% over our 33 samples dataset, below are some statistics about most notable results: For nuclei in epithelium segmentation phase, we applied the algorithm described in Pseudocode IV over our data; we found that optical density of the Haematoxylin stain retrieved through colour deconvolution [23] is the best channel for nuclei segmentation in 24 samples (73%) while the RGB Red channel is the best for the remaining 9 samples (27%).Information regarding average (count, area, and percentage) of nuclei found in the epithelium for each disease group can be found in I.For nuclei in connective tissue segmentation phase, we applied the algorithm described in Pseudocode IV over our data; Information regarding average (count, area, and percentage) of nuclei found in connective tissue for each disease group can be found in Table II.For dilated blood vessels segmentation phase, we found the following statistics that are described in Table III.For collagen fiber segmentation phase, we applied the algorithm described in Pseudocode VI over our data; Information regarding average (area, and percentage) of collagen fibers for each disease group can be found in Table IV.For our classification results; Figure 9 shows our neural network performance, Figure 10    We suggest the following as a future work for this study: 1) Get a larger dataset of diseases, to provide better training and testing of the proposed system, also get test cases in the Aggressive Periodontitis group to include them in the training and testing phases of the program to make it as generic as possible for all periodontal diseases.2) Enhance the fully-automated background removal algorithm found in Pseudocode I, to achieve similar results to the semi-automated one found in Pseudocode II.Also, enhance the epithelium segmentation algorithm found in Pseudocode III, to achieve higher Specificity and sensitivity results.3) Try to find more generic algorithms' parameters, to decrease the number of study cases that we had to alter its parameters to get better results; as shown in the footnotes of Pseudocode III and Pseudocode IV.

Figure 1
Figure 1 Schematic structure of the proposed system

1 .
Get the "Saturation" channel from the HSV color space of the tissue image a , Image 1 , which retrieved from the preprocessing phase, Channel 1 .2. Adjust b intensity values in the grayscaleChannel 1 , to map it to new values in such that 1% of data is saturated at low and high intensities of the Channel 1 .This increases the contrast of the output channel, Channel 2 .[15]. 3. Apply 10 • 10 median filter c for removing the noise in the Channel 2 without harming edges, Channel 3 .[19] 4. Convert Channel 3 to black & white by applying global thresholding through Otsu's method [20], Mask 1 .d 5. Fill Holes of Mask 1 , Mask 2 .6. Perform a morphological erosion with a 10 • 10 window size on the Mask 2 , Mask 3 .7. Morphologically open Mask 3 to remove all white objects that have fewer than 1000 pixels, Mask 4 .8. Perform a morphological dilation with a 20 • 20 window size on the Mask 4 , Mask 5 .9. Morphologically open Mask 5 to remove all black objects that have fewer than 1000 pixels, Mask 6 .10. Perform a morphological erosion with a 10 • 10 window size on the Mask 6 , Mask 7 .11. Edge shifting for the mask by performing a morphological erosion with a strel 2 • 2 window size on the Mask 7 , Mask 8 .[13], [14].a For 9 samples of our 33 samples, we used RGB Green channel instead of HSV Saturation channel, because we found it giving better results.b For 1 sample, we replaced image adjustment step with histogram equalization.c For 6 samples, we used a 20 • 20 window size for median filtering instead of the 10 • 10 window size, and for another 1 sample we used a 9 • 9 window size.d For 12 samples, we applied histogram equalization after step 4.

1 .
Get the optical density of the Haematoxylin as an 8 bit grayscale image, by colour deconvolution [23] of the epithelium part, retrieved from the epithelium segmentation phase, Channel 1 .a 2. Normalize intensity values in the grayscale Channel 1 , to change the range of pixel intensity values to be between 0 and 255, Channel 2 .b 3. Convert Channel 2 to black & white by computing the extended-minima transform, which is the regional minima of the H-minima transform [27], with a value of 30, Mask 1 .4. Fill Holes of Mask 1 , Mask 2 .5. Remove areas from Mask 2 that are not on border, Mask 3 .6. Morphologically open Mask 3 to remove all objects that have less than 15 pixels, Mask 4 .7. Morphologically open Mask 4 to remove all objects that

Figure 4
Figure 4 Nuclei in Epithelium.(A), (C) Two samples of epithelium images.(B), (D) After applying epithelium nuclei segmentation algorithm over (A) and (C) respectively.[Nuclei are surrounded by blue ellipses].

Figure 5 Figure 6
Figure 5 Nuclei in Connective Tissue.(A) Sample of connective tissue image.(B) Nuclei in connective tissue surrounded by black ellipses.(C) Mask1is in blue, Mask2 is in red and yellow, and Final Mask is in yellow.
the "Green" channel of the connective tissue part, retrieved from the epithelium segmentation phase, Channel 1 .b. Enhance the contrast of the grayscale channel Channel 1_1 by transforming the values using contrastlimited adaptive histogram equalization (CLAHE) [28], Channel 1_1 .c. Apply 3 • 3 median filter for removing the noise [19] in the Channel 1_1 without harming edges, Channel 1_2 .d. Convert Channel 1_2 to black & white with 0.75 threshold value, Mask 1 .2. Get Mask 2 a. Get the optical density of the Haematoxylin as an 8 bit grayscale image, by colour deconvolution [23] of the connective tissue part, retrieved from the epithelium segmentation phase, Channel 2 .b. Adjust intensity values in the grayscale Channel 2 , to map it to new values in such that 1% of data is saturated at low and high intensities of the Channel 2 .This increases the contrast of the output channel, Channel 2_1 .[15].c.Apply 2-D adaptive filtering with a 5 • 5 window size over Channel 2_1 , Channel 2_2 .d. Convert Channel 2_2 to black & white by applying global thresholding algorithm using the value got from Otsu's method [20] multiplied by 1.1, and then invert it, Mask 2 .3. Get Final Mask by excluding Mask 1 and Mask 2 from the original connective tissue area, Mask 3 .4. Finally we get the percentage of all objects in Mask 3 .
We used 11 inputs; 3 clinical data (Plaque Index, Pocket Depth, and Attachment Level) and 8 extracted features (Epithelium Percentage, Epithelium Nuclei Count & Percentage, Connective Tissue Nuclei Count & Percentage, Dilated Blood Vessels Count & Percentage, and Collagen Fiber Percentage), 10 hidden neural networks, and 3 outputs representing our 3 diseases groups (Gingival Enlargement, Chronic Gingivitis, and Chronic Periodontitis); the used neural network model is shown in Figure 8.

Figure 8
Figure 8 Neural Network Model 0shows the confusion matrix for the training dataset, validation dataset, test dataset, & the whole dataset, while 0Figure 11 shows Receiver Operating Characteristic ROC for them.

Figure 9
Figure 9 Neural Network Training Performance I. CONCLUSION AND FUTURE SCOPE An automated system has been developed for classification of periodontal diseases using H&E stained microscopic images of the tissues around the affected teeth along with clinical data.The epithelium percentage of the whole tissue, count & percentage of nuclei in epithelium & connective tissue, dilated blood vessels count & percentage, and collagen fiber percentage are used as features during the classification process which is done using Feedforward Backpropagation Artificial Neural Networks.It was found that using these mixed features together achieve a more accurate classification results than using only clinical data or H&E stained images' extracted features.

Figure 10
Figure 10 Neural Network Training Confusion Matrix Get the "a" channel from L*a*b* color space of the original image, Image 1 .2. Smoothing of Image 1 with a 5 • 5 average filter to preserve only large detail [11], Image 2 .3. Convert Image 2 to black & white with 0.2 threshold value, and then invert it, Mask 1 .4. Morphologically open Mask 1 to remove all black objects that have fewer than 1000 pixels, Mask 2 .5. Morphologically open Mask 2 to remove all white objects that have fewer than 1000 pixels, Mask 3 .6. Perform a morphological dilatation with a 10 • 10 window size on the Mask 3 , Mask 4 .7. Fill Holes of Mask 4 , Mask 5 .8. Perform a morphological erosion with a 10 • 10 window size on the Mask 5 , Mask 6 .9. Remove areas from Mask 6 that are not on border [12], Mask 7 .10. Morphologically open Mask 7 to remove all objects that have more than 10000 pixels, Mask 8 .11. Apply final smoothing for Mask 8 , Mask 9 .
the Channel 1_1 without harming edges, Channel 1_2 .d. Convert Channel 1_2 to black & white with 0.8 threshold value, Mask 1_1 .e. Morphologically open Mask 1_1 to remove all objects that have less than 80 pixels, Mask 1 .2. Get Mask 2 a. Get the optical density of the Haematoxylin as an 8 bit grayscale image, by colour deconvolution [23] of the connective tissue part, retrieved from the epithelium segmentation phase, Channel 2 .b. Adjust intensity values in the grayscale Channel 2 , to map it to new values in such that 1% of data is saturated at low and high intensities of the Channel 2 .

TABLE I .
EPITHELIUM NUCLEI SEGMENTATION STATISTICS.

TABLE II .
CONNECTIVE TISSUE NUCLEI SEGMENTATION STATISTICS.

TABLE III .
DILATED BLOOD VESSELS SEGMENTATION STATISTICS.

TABLE IV .
COLLAGEN FIBER SEGMENTATION STATISTICS.