Object based Image Splicing Localization using Block Artificial Grids

People share pictures freely with their loved ones and others using smartphones or social networking sites. The news industry and the court of law use the pictures as evidence for their investigation. Simultaneously, user-friendly photo editing tools alter the content of pictures and make their validity questionable. Over two decades, research work is going on in image forensics to determine the picture’s trustworthiness. This paper proposes an efficient statistical method based on Block Artificial Grids in double compressed images to identify regions attacked by image manipulation. In contrast to existing approaches, the proposed approach extracts the artefacts on individual objects instead of the entire image. A localized algorithm is proposed based on the cosine dissimilarity between objects and exploit the tampered object with maximum dissimilarity among objects. The experimental results reveals that the proposed method is superior over other current methods. Keywords—Image forensics; splicing localization; block artificial grids; object segmentation; double compression


I. INTRODUCTION
Now-a-days, people freely share their ideas, pictures, and comments on social networking sites. The usage of images grows enormously in different ways, such as the Government initiative towards digitizing all areas, evidence in the court of law, journalism, science, and forensics discovery [1]. Simultaneously, the widely available image editing tools induced interest in making the images or videos manipulate with ease that cannot trace out to human vision. Copy-move, splicing, resampling, cloning are few manipulation attacks to tamper images. A manipulated image significantly impacts the trustworthiness when used for evidence [2] [3]. It brings a significant challenge in image forensics to discover the original one from manipulated at the same time establish its authenticity and locate the tampered region [4].
Digital Image Forensics from Multimedia security aims at designing powerful techniques to detect manipulation attacks on images [5]. Active methods like watermarking, authentic code embedded in the original image, and verifying its authenticity. In contrast, passive methods like tampering detection do not require any external clue to assess the image's authenticity. Different tampering techniques in the literature assume that images taken from different camera models or different processing operations introduce inherent patterns into tampered image [6] [7][8] [9]. Furthermore, it assumes that these underlying patterns consistent throughout the original image, and when any manipulation attacks it, there will be inconsistency in those patterns. These inconsistency statistics can thus be used as forensic features to identify image tampering [10] [11].
In the image splicing tampering, a part of the source image is copied and pasted into the donor image. Some postprocessing techniques will apply to the tampered region to make the attack invisible and difficult to trace to the human eye [12]. This challenge attracted many researchers to find various techniques for detecting image splicing. Many of these techniques extract image features and use classification to reveal for forgery, and they achieve even high success rates [13] [14]. However, it is worth locating the tampered region for many real-time purposes to gain confidence. However, image splicing localization brings many more challenges as it requires pixel-level analysis rather than image-level analysis [15] [16].
The images captured by digital cameras store in the Joint Photographic Experts Group (JPEG) format. Lossy compression is used in the JPEG format and is responsible for the proliferation of images on websites and social networking sites. The image divides into 8 x 8 non-overlapping blocks in JPEG compression, and the discrete cosine transform (DCT) is evaluated for each block and then quantified using a regular quantization matrix. When any splicing attack manipulates the image, it leads to discontinuities, and these statistical traces use to exploit tampering attacks, such as JPEG quantization artefacts and JPEG grid alignment discontinuities [17] [18].

A. Related Work
The tampered blocks will undergo single compression when there is a splicing attack, while the remaining blocks will have double compression(DQ). In [19], the authors created periodic DCT patterns and evaluated each block of the image concerning its conformance of the model. Any block whose probability distribution distinguishes from the original classifies as blocks manipulated by a tampering attack. A similar approach found in [20] where the authors assume that the distribution of JPEG coefficients changes with the number of recompressions and proposes training a set of support vector machines (SVM) for the first digit artefacts and estimated the probability distribution of each block as a single or double compressed thereby exposed the splicing attack.
In [21] comparing the discontinuities using the quality factor adopted in the tampered region with the principle that a JPEG ghosts -a local spatial minimum-will correspond to the tampering attack. The limitation of the method is; it works only if the tampered region has a lower quality factor than the www.ijacsa.thesai.org rest of the image. An alternative to the DQ discontinuities, in [22], the authors created a model on the entire image DCT coefficient distributions using the degree of quantization. The inconsistencies became indicative of the tampering attack. The difference between this method and the DCT-based is that the output is not probabilistic, making the technique relatively difficult to interpret although efficient.
In [17], tampering detection and localization uses the probability distribution of its DCT coefficients. Three features that can truly distinguish tampered regions from original ones are used and obtain accurate localization results. But, the refining of the probability map in post-processing influences localization results. To overcome it, [23] used a mixture model based on normalized grey level co-occurrence matrix (NGLCM) and obtained more accurate localization with the prior knowledge of both tampered and original regions. To get this, they used conditional probabilities of tampered regions and original regions of DCT blocks in first, second, and thirdorder statistics.
In recent works, deep-learning techniques applied for tampering detection and localized region. These methods learn the relevant features automatically from the network [24]. In [25] extracted the histograms of DCT coefficients from the input image and designed a one-dimensional convolutional neural network (CNN) with DCT coefficients as input to identify tampered regions by distinguishing single and doublecompressed areas. In [26], proposed a two-layer CNN, in which the stacked auto-encoder model learns the elaborate features for the individual patch of the spliced image and uses contextual information to make the localization accurately. These methods provide block-based accuracy.
For obtaining pixel-level accuracy, [27] proposed a fully convolutional network (FCN) to locate spliced regions. FCN is a particular type of CNN, which replaces the fully connected layers with the convolutional layers having a 1x1 kernel. It distinguishes each pixel as spliced or original. The authors used three FCNs to deal with different scales of image contents, but these methods have drawbacks that they lose or smooths detailed structures and ignore small objects. To improve this effect, in [24] used a region proposal network (RPN), which is a kind of FCN and can be trained end-to-end specifically for detection. Using FCN and RPN, the authors achieved better results than FCN methods as well as other conventional methods. The computational complexity of deep-learning techniques is high.
In [28] proposes localization architecture that uses resampling features to capture artefacts. The Long short-term memory (LSTM), followed by an encoder network, is designed to differentiate tampered regions from the original. The decoder network learns features to localize the tampered region. The final soft-max layer learns the network parameters through the back-propagation algorithm from ground truth masks. The model is capable of localizing at the pixel level with high precision.
Although the deep learning-based techniques improve accuracy, they require training on large labelled databases, and the computational complexity is very high. The networks extract high-level visual features and neglect low-level features, which can be sources for forensic cues. In this paper, we move towards proposing a statistical-based forensic technique that can localize the tampered region from a single image in the presence of double compression. Unlike other techniques that produce probability maps from 8x8 DCT coefficients, we proposed an adequate statistical model that characterizes the fingerprints of block artificial grids (BAG) and works for any compression with any quality factor in the spatial domain.

B. Our Contribution
Over the years, various splicing localization techniques proposed in the literature. Still, there is scope for robustness and effectiveness to improve as splicing is complex. In this regard, we are offering the following contributions to our proposed work. i) We propose object-based segmentation, and the features extracted from the individual objects and for each object, we estimate the variance of the BAG noise ii) Instead of probability maps, we proposed a statisticalbased localization algorithm based on pair-wise dissimilarity among objects to classify the suspicious object from the original ones.
The rest of the paper organizes as follows: Section II described JPEG fingerprints from block artificial grids to speed up computation time. Section III outlines the proposed statistical method to expose and localize the splicing attack. The experimental and evaluation results present in Section IV, and finally, the paper concluded in Section V.

II. PROPOSED METHOD
The primary goal is to localize the tampered region in the spliced image. As shown in Fig. 1, the proposed method is in three levels: object-based image segmentation to extract individual objects from the spliced image and estimate each object's variance using block-artificial grids and the proposed localization algorithm on pair-wise dissimilarity among objects to expose tampered region.

A. Object Segmentation
Object Detection is a complicated computer vision problem to detect and classify objects from an individual image or videos. In many existing popular object detection frameworks, Mask R-CNN [29] is a frequently used one developed by Facebook research. It is an extension of Faster R-CNN that (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, 2020 Fig. 2. Mask R-CNN Frame Work adopted from [29] estimates the object's mask and human pose. It overcomes the COCO suite challenge by segmentation of instances, detecting bounding-box objects, and individual key points.
Using the Mask R-CNN framework, as shown in Fig. 2, performed object detection and segmentation [30] for the given spliced image extracted individual masks of all objects. Then for each mask, find its object from the input image along with the bounding box area. The object corresponds to the mask considered a foreground object, and the remaining part in the bounding box region is the background object.

B. Block Artificial Grids
The lossy JPEG compressed image leaves horizontal and vertical breaks in the image and is commonly refers to as Block Artificial Grids (BAG). The image's BAGs are roughly at the border of a 8 x 8 block with a periodicity of 8 at both horizontal and vertical edges. When any manipulation attack alters the image, the BAGs appear within the block instead of at borders. Thus this JPEG fingerprint is used in image forensics [31].
While compress the image using a digital camera, it introduces noise such as natural noise, BAG noise due to the JPEG compression factor. The artificial grid lines in a 8 X 8 block are feeble than the border edges. In [31], the authors extracted weak horizontal and vertical lines of a grayscale image with a periodicity of 8 separately to enhance these weak lines, and then combined them is referred to as BAGs.
In this paper, we focus on extracting BAGs in colour images. Since the luminance component in the JPEG standard is 8 x 8 blocks, we used only the luminance component rather than C b and C r of components of the Y C b C r image. A median filter is applied to enhance the weak edges and remove the interference coming from strong image edges. To further reduce the edge influence as in [31] ignored differentials greater than an experimental threshold. Then the enlarged horizontal edges are accumulated for every two subsequent blocks as: Then to equalize the amplitudes throughout the resultant image, a local median is reduced from each element.
Thus, the weak horizontal edge image w h obtained by applying the periodical median filter as: where w h (m, n) are elements of extracted horizontal BAG lines. The five elements in Eq. 4, with spacing eight used in the median filter, makes the strong BAGs and weak BAGs smooth, and rest are removed. As more elements used in the median filter, BAGs can extract in a better way.
The vertical BAGs w v are also similarly extracted.
The final BAG obtained by combining Eq. 4 and 5 as Eq. 6 gives BAGs for the original image. In the tampered image, the BAGs appear at some abnormal position, such as the block center. So, for a fixed 8 x 8 block w m n, these abnormal BAGs can be obtained as [31].

C. Localization of Splicing Region
Mask R-CNN object detection framework [30] is used to detect individual masks from the spliced image. For each mask, first split into the foreground and background objects and extracted the BAGs, as discussed in Section II-B.
To expose discrepancies in BAGs of individual objects, we find BAG noise from Eq. 7 as: µ is mean, σ is variance, and R represents the no of BAG features in w m n.
After BAG noise obtained for each object, pair-wise dissimilarity among objects evaluated as follows: www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, 2020 For each pair of the distinct foreground or back-ground objects, let the BAG noise be S 1 and S 2 . Then the cosine dissimilarity between the objects defined as: The probable tampered object with maximum dissimilarity with other objects is exposed from the dissimilarity matrix using the proposed localization algorithm 1.

III. EXPERIMENTAL AND PERFORMANCE ANALYSIS
This section evaluates the proposed method on two datasets and compares its performance with contemporary techniques.
Typically, CASIA dataset [32] is a widely used evaluation dataset for JPEG image splicing forgery detection, and it consists of 7491 authentic and 5123 spliced images with JPEG, TIFF, and BMP types of images. We randomly selected 1000 tampered images of animals, persons, birds, vehicles with the size 384 x 256 and segmented the objects using the Mask R-CNN framework. The proposed method is tested on those chosen tampered images of the CASIA dataset for localizing spliced regions. The qualitative evaluation of splicing images on the CASIA dataset shows in Fig. 3. The first row consists of randomly chosen four images, and the respective ground truth masks given in the second row. The proposed method results are in the last row, where the spliced region is highlighted, and the remaining area is marked as white. From the results, the proposed method's superiority is very clearly evident to localize the spliced region.
To increase the proposed method's robustness, we have evaluated our approach on the Image Manipulation Dataset (IMD) [33]. The dataset contains a 48 pixel high-resolution JPEG compressed images with size 3264 x 2448 with different quality factors ranging from 20% to 100%. The images were cropped to 2048 x 1536 to reduce the computational complexity and spliced each other and obtain 600 spliced images. Then the proposed method was assessed on those images. The evaluation results on our customized IMD spliced dataset obtained from [33], shown in Fig. 4. The first row contains randomly chosen four sample images from the dataset. The ground truth masks are in the second row, and the (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, 2020 proposed method results are in the third row. From the results, the proposed method works well on high-resolution images.

A. Localization Accuracy
The accuracy of splicing localization evaluates based on pixel-level F-measure. Two metrics, True Positive Rate (TPR), measure the rate of pixels that are indeed detected as spliced, and False Positive Rate (FPR), a measure of the rate of pixels that are falsely detected as spliced, are used to evaluate Fmeasure.
Where TP is True Positive, FP is False Positive, TN is True Negative, and FN is False Negative. It expects to have high TPR and low FPR in the results. From these metrics, the F-measure defines as follows: We evaluated average TPR and FPR and F-measure for all the selected images from the CASIA dataset and compared them with [23] and [24] to analyze the performance of the proposed method.
The method of [23] is based on a normalized gray level co-occurrence matrix on 8x8 DCT coefficients and, using the Bayesian posterior probability map, localized the tampering objects. Whereas, the method [24] uses a deep learning method based on Fully Convolutional Networks (FCN) with Region Proposal Network (RPN) to localize the tampered region. To evaluate the superiority of the proposed method, we compared our results with conventional and deep learning methods. Table I contains the Comparative results of the proposed method with [23] and [24] methods on both datasets based on average F-measure. FCN methods [24] prove to have superior performance than the conventional statistical-based methods [23]. From the results, it is evident that BAG noise on individual objects in the proposed method enables us to have much superior performance than [23].
The method is robust when it has a stable performance even after applying some post-processing operations on the spliced image. To evaluate the proposed method's robustness, we applied JPEG compression with different quality factors, Gaussian blur, and added Gaussian noise to all the spliced images and tested.
For JPEG compression, eight different quality factors ranging from 20 to 90 are considered. For Gaussian blur, Gaussian smoothing kernel with standard deviation σ = 1.0 is used, and for Gaussian noise, the variance of 0.03 and 0.05 are considered.
The evaluation results on IM Dataset has been shown in Table II. As the quality factor (QF) in JPEG compression decreases and additional post-processing operations included, the FCN and NGLCM methods decrease in their average Fmeasure values. In contrast, the proposed method has superior as well as stable performance even in such situations.
The IM dataset images are very high-resolution, and we try to downscale the quality factor to the lowest level 20. Fig.  5 is a graph showing the proposed method's performance with other existing methods. Both FCN+RPN and NGLCM methods decreased their average F-measure as the JPEG compression quality factory is reduced towards 20. The proposed method outperforms and gives stable performance even when the quality factor reduces because the BAGs are affected only in those objects than the rest of the image.

B. Computational Complexity
The effectiveness of any method depends on its average computation time spent is minimal to get the desired result. In the proposed method, after segmenting the individual objects, we obtain BAG features from each object instead of the whole image, thereby saving a lot of computation time. For localization, also we used a simple statistical method instead of unsupervised learning techniques. Table III gives the average running time spent by each method. Among the methods, the proposed method takes less time than other methods.

IV. CONCLUSION
This paper is proposed an efficient method for splicing localization based on block artificial grids in a double compressed JPEG image. When a JPEG image spliced with another image's object, the block artificial grids move from 8x8 gridlines to its centre. Taking this clue, we exposed splicing forgery through object segmentation. The method is straightforward, effective than other conventional methods that use JPEG fingerprints. The proposed method also robust even when the quality factor is low in high-resolution JPEG compression. The method fails on low-resolution images, and we considered it as our future work.