Automatic Aircraft Target Recognition by ISAR Image Processing based on Neural Classifier

This work proposes a new automatic target classifier, based on a combined neural networks' system, by ISAR image processing. The novelty introduced in our work is twofold. We first present a novel automatic classification procedure, and then we discuss an improved multimedia processing of ISAR images for automatic object detection. The classifier, composed by a combination of 20 feed-forward artificial neural networks, is used to recognize aircraft targets extracted from ISAR images. A multimedia processing by two recently introduced image processing techniques is exploited to improve the shape and features extraction process. Performance analysis is carried out in comparison with conventional multimedia techniques and standard detectors. Numerical results obtained from wide simulation trials evidence the efficiency of the proposed method for the application to automatic aircraft target recognition.


INTRODUCTION
Recently, there has been an explosive growth in the research area related to inverse synthetic aperture radar (ISAR) imaging of moving targets [1].High resolution images of targets of interest at long range can be acquired from ISAR images.Moreover, ISAR imaging is becoming an irreplaceable tool in the task of non-cooperative automatic target recognition (ATR).There are a lot of different applications, including detection, imaging, and classification of ships and aircraft with airborne, maritime, and land-based radar systems [2], [3].In the last years, many methods of ISAR ATR techniques have been developed in the open literature.Before detecting an object, the image is first segmented and then the object is recognized [4]- [5].Image segmentation is the process of partitioning/subdividing a digital image into multiple meaningful regions.The segmentation is usually based on measurements taken from the image and might be gray level, color, texture, depth or motion.The result of image segmentation is a set of segments that collectively cover the entire image.All the pixels of the same ensemble or region are similar with respect to some characteristic or computed property, such as color, intensity, or texture.Edge detection is one of the frequently used techniques in digital image processing.Object recognition is the task of finding a given object in an image or in a video sequence.For any object in an image, there are many features characterizing the object that can be extracted to provide a feature description of the object.This description, extracted from a training image, can then be used to identify the object when attempting to locate it in a test image containing many other objects [6].Image segmentation is usually done using various edge detection techniques such as Sobel, Prewitt, Roberts, Canny, and other methods [7].Then, only some features characterizing the ISAR images are tested, to identify what kind of target has been detected [8]- [11].In fact, the typical algorithms first detect the edge of an ISAR image, and then adopt different 1-D descriptors such as Fourier descriptors (FD) [12] or 2-D descriptors such as Fourierwavelet descriptors [13] for feature extraction.This is computationally more efficient than evaluating the whole target.Other methods exploit optimal classifiers to determine the specific kind of target, [14]- [16].However, in all these techniques, each target profile is presented as an input feature vector to the classifier.Since providing real-time performance in radar target recognition is a crucial issue to be satisfied, usually neural networks with massive parallel structure and capacity of learning are used in the classifier [17].
The recognition process must be invariant with respect to the target position.At least three different techniques for invariant neural network recognition have been recently proposed.The first approach, namely the invariance by training, compensates for the pattern shift taking into account different targets for different pattern shifts during the training phase.The main drawback of such an approach is that it is inapplicable in many operating situations.In fact, the number of possible variations of patterns makes the training set too large, increasing at the same time the computational complexity of the learning system.The second technique, namely invariance by structure, uses neural networks whose outputs are always invariant to certain transformations.The disadvantage of such an approach is that high-order neural networks are required.Their implementation is too complicated and their application is limited.Recently, the combined neural network method is approaching as the most suitable scheme, to lower the system computational complexity (see [16] and references therein).Finally, the third technique uses feature vectors as inputs for the neural networks.These feature vectors are invariant to the required transformations and, hence, this method is called invariant feature space.This kind of recognition system usually uses the magnitude of the Fourier transform, which is invariant to linear shifts of its input vector [23].The advantages of using www.ijacsa.thesai.organ invariant feature space are as follows: the number of features can be reduced to realistic levels, the requirements on the classifier are relaxed, and the invariance for all input objects is ensured.On the other hand, the main drawback of using invariant feature spaces is the processing time (it can be very long) needed to extract the features before the classifier can be employed.
The novelty introduced in this work is twofold.We first present a novel automatic classification procedure, based on combined neural networks' signal processing.Then, we discuss an improved multimedia processing of ISAR images for automatic object detection.In particular, the neural classifier (NC) is developed as an alternative approach to those existing in the literature (e.g. the Support Vector Machine based algorithms are widely used for the patterns recognition and classification).Designers and users will be then able to choose the different approaches depending both on the nature of the problem to be solved and on the used technology.In our case the NC is composed by combining 20 feed-forward artificial Neural Networks (NNs).Nevertheless, the number of NNs can be changed to obtain several different performances depending on the difficulty of classification problem.Moreover, it is well known that the structure of a neural network is fixed on the base of the problem to be solved and the available data.Furthermore, it's clear that a deterministic way to define the number of hidden layers and the number of neurons does not exist.In our case, after performing of several experimental results, the NNs have been all made by one input layer and two hidden layers made of 168 and 8 neurons, respectively.Then, the output of each NN consists of one neuron that returns a value characterizing the class of the related input pattern (Fourier descriptors of the ISAR image to classify).The ISAR images are first preprocessed with conventional filters, in order to reduce the speckle noise.Then, the combination of two image processing techniques, recently introduced in literature, is exploited to improve the shape and features extraction process.First, the Smallest Univalue Segment Assimilating Nucleus (SUSAN) algorithm [18] is applied to the ISAR image.Then, the output of the SUSAN method is processed by a modified level set evolution method, namely distance regularized level set evolution (DRLSE) [19].We use the first method (i.e.SUSAN) as a pre-processing step, in order to segment the input image into two regions of pixels containing the ensemble of the target pixels and the ensemble of the background pixels (i.e.pixels not belonging to the target).Then, the DRLSE algorithm is applied to the first ensemble (i.e. the target pixels region) as a linking edge method, whose output is the target contour.Once the aircraft shape is extracted, the invariant Fourier descriptors (FD) are computed and used as the input of the neural classifier.
The remainder of this paper is organized as follows.In Section II, the proposed neural networks classifier is described, while the conventional multimedia processing is illustrated in Section III.Section IV presents the proposed ISAR images segmentation and shape extraction techniques.Numerical results and comparisons are outlined in Section V, and finally, our conclusions are depicted in Section VI

II. NEURAL CLASSIFIER FOR OBJECT DETECTION
In this Section, we discuss the structure of the proposed ATR scheme composed by a system of 20 feed-forward artificial Neural Networks (NNs) [20].The recognition process must be invariant with respect to the target position.At least three different techniques for invariant neural network recognition have been recently proposed.The first approach, namely the invariance by training, compensates for the pattern shift taking into account different targets for different pattern shifts during the training phase [21], [22].The main drawback of such an approach is that it is inapplicable in many operating situations.In fact, the number of possible variations of patterns makes the training set too large, increasing at the same time the computational complexity of the learning system.The second technique, namely invariance by structure, uses neural networks whose outputs are always invariant to certain transformations.The disadvantage of such an approach is that high-order neural networks are required.Their implementation is too complicated and their application is limited.Recently, the combined neural network method is approaching as the most suitable scheme, to lower the system computational complexity (see [16] and references therein).Finally, the third technique uses feature vectors as inputs for the neural networks.These feature vectors are invariant to the required transformations and, hence, this method is called invariant feature space.This kind of recognition system usually uses the magnitude of the Fourier transform, which is invariant to linear shifts of its input vector [23].The advantages of using an invariant feature space are as follows: the number of features can be reduced to realistic levels, the requirements on the classifier are relaxed, and the invariance for all input objects is ensured.On the other hand, the main drawback of using invariant feature spaces is the processing time (it can be very long) needed to extract the features before the classifier can be employed.

Fig. 1. Example of one Neural Network for object detection
In this work, we focus on a neural classifier that uses Fourier descriptors as inputs for the neural networks.As said before, a deterministic way to define the number of hidden layers and the number of neurons does not exist.Hence in our case, referring to the block scheme of Fig. 1, each NN is made of 4 layers: (i) one input-layer composed by 168 neurons and www.ijacsa.thesai.orgequal to the size of the input patterns; (ii) two hidden-layers composed by 8 neurons; (iii) one output-layer characterized by only one neuron.Linear activation functions have been applied to both the input and the output-layer, while non-linear activation functions (in particular, sigmoid functions) have been chosen for the hidden-layer.Then, the overall scheme of the Neural Classifier (NC), obtained by a combined system of these NNs, is depicted in Fig. 2.
The output value of each NN can range between -1 and 1 depending of the input pattern and each input pattern contains 168 Fourier descriptors referred to a specific ISAR image (i.e. to a specific target).For example, let us now consider a NN trained for recognizing the target "TG1": to a more and more positive value of the network output corresponds a higher and higher probability that the input pattern belongs to the (correct) class TG1.Negative values of the output mean that the input pattern is not an element of the considered class.Therefore, the proposed NC takes a pattern made of 168 elements as input pattern (i.e. the Fourier descriptors of the ISAR image) and recognizes the correct class of the target.In particular, four different classes of targets, named TG1, TG2, TG3, and TG4, have been used in this paper as a proof of concept of the proposed combined classifier.When the NC receives a pattern belonging to one of these four classes, it returns an output value which is referred to the selected class.As shown in Fig. 2, the NC is composed by two main boxes: a) the inner classifier, CLi; b) the Final Decision-Maker, FDM.

Fig. 2. Neural Classifier
Each CLi is referred to the related class (1, 2, 3 or 4) and consists of a neural sub-system, composed by five NNs, able to decrease the error probability of trained NNs.Indeed, each CL is composed by five different-trained NNs used to classify the same target class (see Fig. 3).The determining boxes, DETi, with i = 1, 2, 3, 4, perform a very important task, which is described in the hereinafter text.The rationale of our NC is as follows.The neural sub-system CLi contains five NNs which are separately trained, each having the aim to classify the target TGi.At this point, for a fixed input pattern belonging to the class TGi and if at least three NNs return the correct output, the CLi makes a correct classification of the input pattern as the TGi pattern.Obviously, the ideal operating case is that all the five NNs perform the correct target recognition but, to mitigate the possible output errors made by one or two NNs, this majority rule is here applied.Therefore, each CL exploits the majority rule to classify the input patterns.When the NC receives a pattern to classify, only one sub-system should be active at a time (OUTi is equal to 1 when CLi is active, -1 otherwise).The AVGi output returns the average value of the five NNs outputs.The box FDM simply selects the active input and shows it as the output of the whole NC system.Nevertheless, it could happen that more than one CLi is active at the same time anyway.In these cases, the boxes DETi and FDM play a very important role, exploiting the average values of the related neural subsystems.For example, given a generic input pattern TGx, if the CL1 and CL2 outputs are both active, DET1 and DET2 look at the average values (AVG1 and AVG2) of the five NNs both for the CL1 and CL2 neural sub-systems, respectively.At this point, the box FDM operates as follows.If the average value AVG1 is bigger than AVG2, the input pattern belongs to the CL1 class (TG1).In the opposite case, the input pattern belongs to the CL2 class (TG2).Obviously, for the worst (limit) case in which all the CLi are active, the box FDM selects the output referred to the CL that performs the biggest average value.

III. CONVENTIONAL MULTIMEDIA PROCESSING
The conventional multimedia processing methods for edge detection usually exploit the Sobel, Prewitt, Roberts, and Canny detectors [24].In particular, the Sobel operator performs 2-D spatial gradient measurement on an image and so emphasizes regions of high spatial frequency that correspond to edges.The Prewitt operator is an approximate way to estimate the magnitude and orientation of the edge.Then, the Roberts operator performs 2-D spatial gradient measurement on an image and highlights regions of high spatial frequency which often correspond to edges.Finally, the Canny detector is a method to find edges by isolating noise from the image without affecting the features of the edges in www.ijacsa.thesai.org the image and then applying the tendency to find the edges and the critical value for threshold.
The classical edge detector proposed almost 20 years ago by Canny [25] performs remarkably well with its simplicity and elegance.Canny's edge detector attempts to maximize simultaneously localization and signal-to-noise ratio.A typical implementation of the Canny edge detector is as follows [26]: (i) first, smooth the image with an appropriate Gaussian filter to reduce desired image details; (ii) determine gradient magnitude and gradient direction at each pixel; (iii) if the gradient magnitude at a pixel is larger than those at its two neighbors in the gradient direction, mark the pixel as an edge.Otherwise, mark the pixel as the background; (iv) remove the weak edges by hysteresis thresholding.Indeed, in recent comparisons of edge detector performances (see for example [24] and references therein), Canny detector was the best or one of the best.This is the reason why in the following of this paper we have decided to compare the results obtained with the new multimedia processing, described in the next Section, with the multimedia processing obtained by the Canny operator.

A. Database creation
We have used 4 targets, corresponding to 4 different military aircrafts: one MIG-29, one F-104, one F-22, and one Eurofighter-Typhoon.The ISAR images of these targets, provided by the multinational firm MBDA (Rome, Italy) are represented in Fig. 4.Then, we have created a database of ISAR images composed of more than 500 images, each one representing a target with a different azimuth angle, as shown for example in Fig. 5 for 15 different angles for the Eurofighter Typhoon target.The training and validation tests for each target class are then made of 30 and 120 ISAR images, respectively.All the NNs described in the previous section have been trained by using the well-know Levenberg-Marquardt back-propagation algorithm [27].

B. Pre-Processing
ISAR images are usually affected by a multiplicative noise known as speckle noise.This is due to the interferences produced by radar waves and results in light and dark pixels in the ISAR image that drastically reduce the image quality.Automatic interpretation of the image as well as performing shape and features extraction become cumbersome issues to be implemented.Therefore, image pre-processing is the first and crucial phase to be addressed in order to reduce the speckle effects, and improve the image quality.A great number of different filters have been proposed in the open literature, such as the Frost [28], Lee [29], and median [18] filters.Since our ISAR images are affected by low speckle noise values, following the same approach of [18], we have used a linear filter followed by a median filter to improve the image quality.It has to be noted that, in case of images highly corrupted by noise, the median filter has been replaced by a Lee filter [28] to facilitate the automatic segmentation process.

C. Object Detection
The shape extraction process, i.e. the process by which the contour plots are extracted, is here performed using the cascade of two different methods.First, we apply to the ISAR image the SUSAN algorithm [18], and then the output of the SUSAN method is processed by a recently introduced level set evolution (LSE) method, called distance regularized level set evolution (DRLSE) [19].In particular, the SUSAN method is here used to extract pixels from the ISAR image belonging to two different regions: target pixels and background pixels.Then the DRLSE algorithm is used as an edge linking method, to extract the target contour.
More in details, each pixel in the input ISAR image is processed with a circular mask (named also window or kernel), and the sum of grayscale comparison between the mask center (Nucleus) and a local mask area (known as the USAN, Univalue Segment Assimilating Nucleus) is calculated.The mask is placed at each point in the input image, and then the brightness of each pixel within the USAN area is compared with the nucleus (center point), as follows: where P 0 and P i correspond to pixel of the nucleus and any pixel of USAN area, respectively.Then, im(x i , y i ) is the gray level of the pixel that have coordinate (x i , y i ), while t stands for the brightness difference threshold.The comparison expressed by eq. ( 1) is performed for each pixel within the mask and the sum S of all these comparisons is evaluated.Finally, the sum S is compared with a fixed threshold G (namely the Geometric threshold [28]) which is set to ¾ of the maximum value which can take S. www.ijacsa.thesai.orgThe value of the treated pixel is then replaced by the following: Like in [18], we consider the application of the SUSAN algorithm as a pre-processing step, in order to segment the input image into two regions of pixels, the ensemble composed by the target pixels and the other one full of background pixels.Hence, we can we add another condition in the standard SUSAN algorithm, modifying eq. ( 2) as follows: where the threshold t has been chosen according to the following [1]: and k is a constant that depends on the image size-to-target ratio.Finally, the DRLSE method of [18] is applied to extract the target shape from the results of the modified SUSAN algorithm.

D. Features Extraction
The Fourier descriptors (FD) have been frequently used as features for image processing, remote sensing, shape recognition and classification [30].They are chosen accordingly to their good performance in recognition systems and their implementation simplicity and efficiency.In fact, they are invariant to geometrical transformations, such as translation, scaling and rotation.The authors in [18] have used the centroid distance as shape signature.This distance is expressed by the distance between boundaries.Conversely, here we have applied a simple discrete Fourier transform (DFT) on the shape boundaries, extracted by the previously described methods.In particular, we have identified the boundary of a target by means of a (complex) vector, whose elements are the coordinates of the contour points.Then, we have applied the 1D-DFT to this (complex) vector, obtaining the FDs of the target.The obtained FDs are invariant to geometrical transformations.
Finally, in order to decrease the computational complexity of the entire system, we have constrained the vector length to 168 elements, corresponding to 168 FDs of the target's contour.These FDs have been used as the inputs of the new ATR technique, detailed in the following Section.

V. PERFORMANCE ANALYSIS
In this Section, we first discuss our results obtained through the proposed multimedia processing technique, in terms of shape and edge extraction.
Then we present the results obtained through our neural classifier in terms of mean detection probability, comparing our results with state of the art detectors.

A. Results about shape and edge extraction
Here, we discuss the results of the proposed multimedia processing versus the conventional Canny detector.In the preprocessing steps, we have used a SUSAN circular mask composed by 37 pixels with a radius of 3 pixels.Referring to Fig. 6, the noisy ISAR images are first pre-processed to reduce the speckle noise (Fig. 6a), then the SUSAN method (Fig. 6b) is applied in order to extract the edge of the target (image segmentation).
Finally, the DRLSE method (Fig. 6c) and the Canny detector (Fig. 6d) are used as linking-edge techniques and the FDs are computed form both the images, as explained in Section IV.It is clearly evident from the figures that the edges obtained with the Canny method (see also Fig. 7 and Fig. 8) are characterized by a poorer quality in respect to the ones obtained with the new multimedia processing.

B. Results about target classification
The 168 FDs, describing the specific target under investigation, are then passed to the classification step, for the training and validation phase.As previously detailed, each ISAR image is characterized by a pattern of 168 Fourier descriptors.
The confusion matrix for the validation test, for the four targets examined in this paper, is shown in Tab. 1 and in Tab. 2 for the Canny detector and the new multimedia processing, respectively.
In particular, we have reported the percentage of correct detection, indicated by bold numbers, and the percentage of errors (false recognition) in the tables.For example, the target TG1 is recognized with a detection probability of more than 93% with the new multimedia processing, while the Canny method reaches only a percentage of 90.83%.
Then, the target TG4 is detected with lower probability in both cases and it is automatically identified as TG1 with a percentage of 5.0% or 33.33%, for the new and Canny processing respectively.Notwithstanding the last two targets are characterized by lower detection probabilities, the obtained results, by the new multimedia processing, seem really promising since the NC is able to achieve quite large percentage values of correct object detection.In particular, this is due to the bad performance of the Canny detector for different azimuth angles.
See for example Fig. 7 (a, b) and Fig. 8 (a, b) where the edges extracted with the Canny detector and the new processing are compared.In particular, two targets of interest are considered: the Eurofighter Typhoon and the F-104, respectively.It is clearly visible from the figures that, in the case of the Canny detector, the smoothing effect due to the segmentation process do not allow to correctly extract the edges of the detected object.Another advantage of our processing lies in the combined structure of the proposed classifier.In fact, we always exploit the most suitable neural sub-system for each target class, i.e. the inner classifier CL composed by 5 NNs.The CL subsystem aims at decreasing the error probability with respect to the conventional case in which only one NN decides about the class of the target.Moreover, the further use of average values (performed by the DETi sub-systems) improves the performances of the proposed classifier when ambiguities exist at the outputs of the neural sub-systems.Finally, in order to prove the efficiency of our NC with respect to state of the art detectors, a comparative analysis is shown in Tab. 3. In particular, the mean values of the correct classification probability are reported in tab.2, for our proposed NC (mean recognition of 81.6%) and for two classifiers proposed in [18].In particular, the authors in [18] obtain a mean recognition percentage of 75.98%, using a K Nearest-Neighbor classifier (K-NN) and then, they improve the system performances exploiting the Support Vector Machine (SVM) algorithm (reaching a mean detection value of 80.37%).However, this further approach appears less effective than the one here presented.

VI. CONCLUSIONS AND FUTURE WORKS
This work has proposed a new automatic target classifier, based on a combined neural networks' system, by ISAR image processing.The novelty introduced in our work is twofold.We have first introduced a novel automatic classification procedure, and then we have discussed about an improved multimedia processing of ISAR images for automatic object detection.www.ijacsa.thesai.org We have exploited a neural classifier, composed by a combination of 20 feed-forward artificial neural networks.The classifier is used to recognize aircraft targets extracted from ISAR images.The combination of two image processing techniques, recently introduced in literature, is exploited to improve the shape and features extraction process.Then, Invariant Fourier descriptors are computed and used as input features to our combined system.Performance analysis is carried out in comparison with conventional multimedia processing techniques as well as with classical automatic target recognition systems.Numerical results, obtained from wide simulation trials, evidence the efficiency of the proposed approach for the application to automatic aircraft target recognition.Future works will regard the improvement of the performances of the single NNs by applying suitable optimization algorithms to the NNs learning process.Indeed, it is possible to operate a multivariate function decomposition with the aim to perform the learning optimization of Multi-Input-Single-Output (MISO) feed-forward Neural Networks [31].Furthermore, applying new powerful search algorithms (e.g.meta-heuristic algorithms such as those shown in [32]- [35]) can increase the generalization feature of the Neural Networks in particular after they are built by using a partitioning of the domain (see [36]).Finally, other more elaborate algorithms could be applied to the multimedia processing by starting from novel concepts already existing in literature (e.g.[37][38]).

Fig. 3 -
Fig. 3 -Block scheme of the generic i-th CL block.
Tab. 1. Object detection performance of our neural classifier with the