Enhancement of 3D Seismic Images using Image Fusion Techniques

Seismic images are data collected by sending seismic waves to the earth subsurface, recording the reflection and providing subsurface structural information. Seismic attributes are quantities derived from seismic data and provide complementary information. Enhancing seismic images by fusing them with seismic attributes will improve the subsurface visualization and reduce the processing time. In seismic data interpretation, fusion techniques have been used to enhance the resolution and reduce the noise of a single seismic attribute. In this paper, we investigate the enhancement of 3D seismic images using image fusion techniques and neural networks to combine seismic attributes. The paper evaluates the feasibility of using image fusion models pretrained on specific image fusion tasks. These models achieved the best results on their respective tasks and are tested for seismic image fusion. The experiments showed that image fusion techniques are capable of combining up to three seismic attributes without distortion, future studies can increase the number. This is the first study conducted using pretrained models on other types of images for seismic image fusion and the results are promising. Keywords—Image fusion; seismic image; seismic attribute;


I. INTRODUCTION
Seismic images are data gathered during the exploration of the earth subsurface by sending seismic waves to the earth subsurface and recording the reflection. They provide subsurface structural information and allow the modeling and visualization of the earth subsurface [1]. Seismic attributes are quantities derived from seismic data that supplement and emphasize certain information to make the visualization process more informative [2]. To create an accurate representation of the earth subsurface, a geophysicist needs to look at the seismic image and its corresponding seismic attributes, interpreting a large amount of information simultaneously, which is a cumbersome effort. Therein lies the challenge, combining multiple views into a single view that effectively exploits all information contained in all individual views and reduces the duration of the process.
There has been work addressing the challenge of combining seismic attributes such as volume blending, crossplotting, principal components analysis (PCA) and Octree [3], [4]. The most recent and relevant is the work done by Al-Dossari et al. [4]. They have extended octree color quantization Algorithm, to increase the number of the combined seismic attributes. The main limitations are the maximum number of attributes is limited to eight, the order of the attributes effects the results and the combined image results have artifacts.
Alfarraj et al. [5] proposed a multiscale fusion technique to enhance the seismic geometric attributes using a Gaussian pyramid to generate different scales of an attribute, which then are fused together to get an enhanced attribute. This technique reduces noise and improves the resolution.
By extending the use of fusion techniques to enhance seismic data instead of enhancing the resolution of a single attribute, these fusion techniques can be used to enhance seismic data by combining multiple attributes, combining multiple images into a single one is a need that is common among several domains such as photography and medical imaging. One of the techniques used to address this need is Image fusion.
Image fusion is used to combine two or more input images containing complementary details of the same scene creating a new image [6]. The input images are taken from the same imaging device with different parameters or from different imaging devices; the resulting fused image will be more informative than any individual input image [7]. Image fusion techniques show incredible benefits in various tasks of image processing that depend on viewing multiple images of the same scene such as object detection and recognition as well as in a variety of fields, for example digital photography, medical imaging and remote sensing [6], [7]. On these types of tasks, combining the significant details of multiple input images into a single fused image can often reduce the difficulty and enhance the outcome of the task [6]. The information enhancement provided by seismic attributes of a seismic image is similar to many image fusions tasks such as medical imaging and remote sensing. Conceptually, we can consider different seismic images and attributes as different types of medical images i.e., the seismic (raw) image as magnetic resonance imaging (MRI); one attribute as positron emission tomography (PET); and another attribute as computed tomography (CT).
Recently, with the rise of Deep Learning (DL) many methods from image fusion using DL have been proposed. DL is a class of Machine Learning algorithms that excel in feature extraction and image representation using neural networks. www.ijacsa.thesai.org Convolutional Neural Networks (CNN) can solve conventional hand-crafted approach issues of designing fusion methods and selecting fusion rules and activity-level metrics as it can learn features implicitly through training on data, since image fusion tasks are very similar to classification tasks which CNN shines in, so it gets better results [8].
In order to achieve our task, in our paper, we propose a deep learning method which is constructed by neural networks. We use the method to extract image features and then fuse them into one. The method receives 3-dimensional (3D) image data; each piece of 3D data represents either the seismic (raw data) image or seismic attributes. Then the method slices the 3D data and forwards the resulting 2dimensional (2D) images as an input to the fusion model, which extracts the important information from input images using the convolutional layer and creates feature maps. Then we fuse the feature maps to create the output image. Finally, the method constructs the output as 3D data.
We conduct experiments to analyze the proposed method using different fusion models. These models have been pretrained on a dataset belonging to different image fusion tasks and have achieved excellent results compared to other models' performance on the specific task. The reason for using pretrained models is the lack of available datasets for seismic images with ground-truth fusion images, which hinder training process. We are investigating task-specific models' performance on seismic images, trying to find commonality between this task and other tasks, whether due to data similarity or a certain task's model ability to generalize to include seismic images tasks. Thus, the experiments will analyze similarities between the seismic image fusion task and other fusion tasks.
Our paper is structured as follows. In Section II, we briefly review related works. In Section III, the proposed fusion method is introduced in detail. The experiment results are shown in Section IV. The conclusion of our paper and discussion are presented in Section V.

II. RELATED WORK
In this section, we give a briefing of the previous work done by extended octree quantization method and highlight its limitation. Then we review the work done so far in Image fusion, to investigate the most relevant approaches that suites the problem of seismic attributes combination.
Al-Dossari et al. [4], proposed to use octree and its color quantization algorithm to combine groups of attributes onto a single one, by extending octree's color quantization three nodes to eight nodes octree color quantization, this method originated from image processing of compressing colors, it handles eight groups of attributes to form a single attribute. The method was tested on to combine up to eight seismic attributes. The method has the following limitations: The order of the chosen attribute will affect the result and the need to take average of multiple octree sequences to rectify and the method can be applied to up to eight attributes.

A. Overview of Image Fusion
Image fusion is a subfield of image processing. It is the process of combining multiple input images captured from one or more sensors to create a single image [6]. The aim of image fusion is not only to minimize the amount of information, but also to create images that are more suitable and more comprehensible to human and computer perception. It gives the possibility to collect information from multisource images to generate a high-quality fused image with all spatial and spectral information [7].
The image that has been fused must satisfy the following conditions: (1) include all pertinent information, (2) be clear of all artifacts and anomalies and (3) have all noise and error removed. Major applications of image fusion include remote sensing image fusion, medical image fusion, and multi-focus image fusion [6].
The general image fusion strategy consists of the following steps: acquisition of different input images, image-to-image registration and fusion. Image registration entails feature detection, aligning and matching, estimation of the transformation model, image transformation and resampling. Fusion rules are performed either as a direct mathematical application such as averaging or choosing the maximum pixel value or as a part of the image transformation model [9].
DL-based image fusion has shown a lot of potential in improving image fusion techniques. The basic architecture of CNNs consists of two parts: the feature extractor and the classifier. The feature extractor uses convolutional and pooling layers to extract the salient features of the inputs and represent them using activation maps, which align with the image registration step of image fusion. The classifier can be utilized or changed to apply fusion rules on the maps, which align with the fusion step of image fusion. Also, CNNs have the ability to use multiple fusion rules due to being trained on a large dataset, thus avoiding one of the limitations of classical fusion methods. Fig.1 shows the basic architecture of CNNs.

B. Image Fusion Applications
Image fusion can be grouped into multiple classes based on the task it performs. These classes include the following: (1) multi-focus image: it fuses images with different focus depth to create a focused image; (2) Visible/Infrared light image fusion: it fuses images taken using infrared with visible light to create a more informative image; (3) medical image fusion: it fuses images used in the medical field such as MRI www.ijacsa.thesai.org and CT to create a more informative image; and (4) multiexposure image fusion: it fuses images with different exposure with different lighting to create a superimposed image [10].
For the task of multi-focus image fusion, a CNN is used by Liu et al. [11] to address a binary classification task in the area of multi-focus image fusion in the spatial domain. The Siamese network structure is implemented where two similar networks function together to generate one output. Du and Gao [12] extended the work of Liu et al. [11] with the distinction of adding multi-scale decomposition of inputs before feeding it to the network. The input images are segmented into three overlapping stacks with three different sizes. The network generates three focus maps that are averaged to create a single fused focus map. The work [11] demonstrates that the method benefits from the introduction of the multi-scale strategy in terms of performance but it suffers from the drawback of increased computational cost. Another issue most multi-focus image fusion CNN models face is that although they enhance the decision maps, the initial segmented maps suffer from many errors. To overcome this, Amin-Naji et al. [13] proposed a novel Ensample of CNN (ECNN) framework to take advantage of ensample learning, which is used to improve the model's ability to generate decision maps and take advantage of learning from several datasets. The proposed method uses three CNNs trained on three different datasets to create the initial decision maps. The authors used COCO dataset and performed transformation on it to create the additional two datasets. It has the following structure. It has four convolutional layers of sizes (32 64 64, 16 32 128, 8 16 128, 4 8 256) and kernel size (3 3), stride (1 1), padding (1 1), non-linear activation and maxpooling (2 2) for all convolutional layers. For CNN2 and CNN3, they concatenate the output of convolutional layer (8 16 128) as input to convolutional layer (4 8 256), and the output then is concatenated with CNN1 as input to convolutional layer (4 8 256). The output is concatenated and fed to the fully connected layer of size (4 8 512), that classifies each pixel. The novelty of the proposed method lies in its input feeding mechanism; instead of creating branches for feeding focused and unfocused images, the focused and unfocused parts of the image are fed together. As a result, it outperformed all the other models, achieving state-of-the-art results.
For the task of Visible/Infrared light image fusion, Zhong et al. [14] created a model for image fusion and enhanced resolution. First, the input images are upscaled and decomposed, then SR-CNN [15] is used to improve the resolution the high frequency maps. Then they are fused using choose-max rule, while low frequency coefficients are fused using averaging rule. Then the fused image is created by fusing both high-and low-frequency coefficients. This model has produced good results in medical image fusion and multifocus fusion in addition to Visible/Infrared light image fusion. Li and Wu [16] proposed "DenseFuse," a novel encoderdecoder model for the fusion of infrared and visible images. The model uses a dense block in the encoding part to improve the flow of information between layers and avoid information degradation. It has the following structure: the encoder has two components, a convolutional layer of size (3 3 1 16) and a dense block. The dense block has three cascading convolutional layers of sizes (3 3 16 16), (3 3 32 16) and (3 3 48 16). The output is fed to the fusion layer to fuse the feature maps. The decoder has four convolution layers of sizes (3 3 64 64), (3 3 64 32), (3 3 32 16) and (3 3 16 1). It receives the feature maps from the encoder and constructs the image. The results showed state-of-the-art performance compared to other models and exhibited that the proposed model can be applied to other fusion tasks with appropriate modification.
For the task of medical image fusion, Liu et al. [17] incorporated a multi-scale decomposition framework instead of spatial fusion into the method proposed by [11]. The presented framework decomposes input images using a Laplacian pyramid and also passes the input images into the CNN to acquire the weight map, which is then decomposed using a Gaussian pyramid, using a similarity strategy to adjust fusion rules. The Laplacian and Gaussian decompositions are fused, and then the fused image is created using a Laplacian pyramid reconstruction. In Liu et al. [18] Convolutional Sparse Representation (CSR) was used for image fusion. In their method the proposed input images are segmented into detail layer and base layer. Convolutional Sparse Coding (CSC) is executed on the detail layer to get sparse coefficient maps. After several calculations the "choose max" rule is applied to produce a fused coefficient map that is used alongside dictionary filters to create the fused detail layer. Dictionary filters are a set of filters that are trained on a set of natural scene images. Most medical imagery fusion models cannot preserve the energy levels of the input images in the fused images. Yin et al. [19] proposed a novel model that uses Nonsubsampled Shearlet Transform Domain with Parameter-Adaptive Pulse Coupled-Neural Network (NSST PA-PCNN) that can maintain the energy level in the fused image. PCNN is an artificial neural network with biological procedures [20]. The paper presented the use of PAP-PCNN to increase the convergence speed and reduce computation. It also uses NSST for enhanced detail extraction. NSST-PAPCNN is a pulsecoupled neural network that takes the following four steps: NSST decomposition, high-frequency band fusion, lowfrequency band fusion and NSST reconstruction. NSST decomposition extracts image details using the Shearlet filter, and parameter-adapting pulse coupling trains the neuron in an iterative manner for the fusion process. The model exhibits state-of-the-art performance and outperforms on existing tasks.
For the task of multi-exposure fusion, Kalantari and Ramamoorthi [21] proposed a solution to the motion artifact in dynamic scenery through the implementation of a CNN for High Dynamic Range (HDR) image creation. To generate the HDR image directly, three aligned LDR images are used as the input of the CNN. The authors implemented three different network structures to investigate the output adjustments. For the loss function, the Euclidean interval between tone-mapped ground-truth and approximate HDR images is used. Prabhakar et al. [22] proposed "DeepFuse," a novel model for multiexposure fusion that takes an unsupervised approach in the fusion process. Also, the authors created and trained the network on a new benchmark dataset, improving the model's www.ijacsa.thesai.org learning rate. The CNN takes an input image pair and segments the image into color channels of YCbCr. The reason for the segmentation is to use luminance channel Y to display fundamental details by using the CNN feature extraction capabilities. Then the remaining Cb and Cr channels for each input are fused respectively using weighted-average fusion rule, generating Cb Fused and Cr Fused channel and they are combined with the outcome of Y channel to generate the fused image. Then the resulting images are converted back into RGB. DeepFuse is an encoder-decoder model and has the following structure: the encoder has two input channels, both of which have two convolution layers of sizes (5 5 1 16) and (7 7 16 32), respectively. A fusion layer using addition rule is used to fuse the feature maps from both channels. The decoder has three convolution layers of sizes (7 7 32 32), (5 5 32 16) and (5 5 16 1). It receives the feature maps from the encoder and constructs the image. The results showed state-of-the-art results compared to other multiexposure fusion models. Also, results showed that DeepFuse trained on multi-exposure data can perform well on a multifocus task and that the CNN filters are general enough to be used on other image fusion tasks.
Our studies showed that the pretrained models have not been used for seismic image fusion, and that there are similarities between seismic images and medical images, the data capturing method and the semantic importance of different images used in the fusion process. For data capturing, seismic data and medical image data capture images of a survey line using different kinds of waves to represent a 3D object [23]. Also, seismic and medical image fusion inputs contain different types of information for every image. While there are no similarities between the multiexposure, Infrared/Visible and multi-focus image fusion tasks, results showed that training models on composite or detailed scenery will allow models to better generalize to other tasks [24].

III. PROPOSED METHOD
The proposed image fusion method supports the fusion of seismic data and one or more seismic attributes as follows: suppose there are M inputs to the method and M 2, M are 3D images of identical size that are either seismic data or attributes denoted as I R and I An respectively, as I An |n {1,2,3,…,N} as seen in Fig. 2. The inputs I R and I An are first sent to a slicing function to covert the 3D data (x,y,z) into 2D data (x,y) with Z number of images. The output of the slicing function is fed as an input to the fusion model. The fusion model accepts a set of images as an input containing one image from each I R and I An starting by z =1 until Z. After all the fused images are created by the fusion model and the fusion metrics are calculated, the fused images are then made into 3D image data using the reverse of the slicing function.

A. Fusion Model
We chose the models that performs best in their respective fusion task after comparing them in Section II and will compare the performance on the seismic image fusion task. Table I shows a summary of the selected fusion models.

B. Fusion Metrics
The evaluation performance metrics of image fusion that were used in most of the methods in [10,13,16,18] are used to compare the performance of the models. The assessment of the non-referenced fusion is not direct since the ground truth images are not available, so there is a need to use several nonreferenced image fusion metrics to evaluate the models' performance. The metrics formulas can be found in [25]. We used the following metrics: 1) Entropy (EN): measure the information content of the fused image.
2) Information Transfer (Q AB/F ): measure the total information transferred from source images to fused image.
3) Modified Fusion Artifacts (N AB/F ): measure artificial artifacts generated by fusion. 4) Feature Mutual Information (FMI): measure the dependency between the input images and fused image. 5) Mutual Information (MI): measure the similarity of image intensity between the fused and reference images.

IV. EXPERIMENTS AND RESULTS
We conducted experiments on the models discussed in Section II using pretrained models published by [10,13,16,18]. In our study we have designed two experiments. The first experiment takes two inputs and the second one takes three inputs. The first experiment is used to ascertain if the proposed models can combine two different seismic images. www.ijacsa.thesai.org The second experiment is used to show that image fusion can increase the input number limit without distortion, by combining three different seismic images without distorting the resulted image. The common goal of both experiments is to measure and evaluate the performance metrics of running the four selected fusion models on our dataset. Also, conducting the experiments on different numbers of inputs serves the goal of examining the change of measurements as we add more inputs. To evaluate the fusion results, we perform a visual comparison alongside the quantitative comparison, to assess the visual representation such as texture and color etc. of the fused image alongside the structural information.
In the first experiment: the number of inputs M is 2; one input is a seismic image (I R ) and the second is a seismic attribute named skeleton created by a skeletonization algorithm denoted as (I A ) [26]. The size of I R and I A is (876,221,271).
In the second experiment: the number of inputs M is 3; in addition to (I R ) and (I A ), another seismic attribute named coherence is used, denoted (I Ac ). I Ac has the same size as I R and I A .
There are four sets of I R , I A and I Ac for the experiments. A sample of these images is shown in Fig. 3. The images are minimized and cropped to accommodate the space limit; experiments are performed on original images. Experiment 1 and 2's results are presented in Table II and  Table III respectively. The average results of the four sets of each experiment are displayed. Results in bold signify best performance and results that are underlined signify second best. The results showed the performance of the models with different numbers of available inputs. Experiment 2's fusion had more input, and thus the amount of information in the fusion increased, which can be indicated by larger values of EN and MI, and the amount of artificial fusion noise indicated by N AB/F, Thus the discussion below will consider experiment 1's results as it has more information.
The fusion results on two sets of images due to the space limit are shown in Fig. 4 and Fig. 5.
As we can see, the DeepFuse model outperformed all other models. It had the best values in (EN, N AB/F , MI) and secondbest values in (Q AB/F, FMI). Having high EN values denotes that the fused image has a large amount of information and that the model is good for feature extraction. Having high FMI, Q AB/F and MI values denotes that the fused image has maintained structural information and features. Having low N AB/F values denotes less artificial noise due to fusion. When visually comparing the fused image to the input image, the fused image contains all structure information from the inputs; all the edges are clear; the color and texture of the inputs are present and there is no noticeable fusion noise. The performance of the DeepFuse model can be attributed to its filters' adaptability, as it was trained on a large dataset of high-resolution complex images, which allowed the filters to learn and reach a point where they can perform remarkably well on other fusion tasks.
The ECNN model had the best values of FMI and the second-best values of N AB/F . This can be attributed to the fusion rules used by the multi-focus task, which chooses the max value of the pixels in the fusion. The fused pixel value is in one of the inputs, and thus there is high dependency between the inputs and the fused images. But, as can be observed in the fusion results example in Fig. 5, ECNN has the worst visual representation of the fused information. The fused image is missing a lot of structural information and the color and texture don't match the input images. This might be as a result of the nature of the multi-focus task, since its aim is to extract the best parts from multiple input images and create a new image from those parts, which in hindsight doesn't match the desired outcome. Thus, it can be inferred that the multi-focus fusion models are not suitable for the seismic data fusion task.  The NSST-PAPCNN model had the best values of Q AB/F, which can be attributed to the model's ability to preserve energy levels. Seismic images and medical images are similar in that they are both collected using surveys and that the type of information of every input image is different and needs to be maintained. However, the model suffers from the highest artificial noise having the highest values of N AB/F. Visually the fused image emphasizes the edges and lines, but it doesn't capture the color and texture from the input images, and as more inputs are added the fusion noise increases creating a shadow effect on the image.
Finally, the DenseFuse model has the second-best values of EN and MI, indicating the model's ability to extract and maintain the structural information of the input images. This can be attributed to the Dense Block used in the model preserving the images' structure. The model had the secondbest overall performance. Visually comparing the fused image to the input image, the fused image contains all structure information from the inputs; all the edges are clear; and there is no noticeable fusion noise. Visually the difference between the best performer DeepFuse and the second-best DenseFuse is the DeepFuse's ability to capture the color and texture of the inputs better with less artificial noise.
In relation to the work done previously, it has combined a limited number of seismic attributes (eight) and cannot add more due to the cluttered results. Our research is work in progress, and we are planning to experiment by adding more attributes. In principle, our work can accept unlimited number of attributes. Also, as can be seen from our experiments, a third attribute improved the results and has not cluttered the images rendering them unreadable.

V. CONCLUSION
In this paper, we investigate enhancement of 3D seismic images using image fusion and deep learning. Fusion technique has been used to enhance the resolution and reduce the noise of a single seismic attributes. The study was conducted to evaluate the feasibility of using image fusion models pretrained on other image fusion tasks for seismic data fusion. We presented a method for the enhancement of 3D seismic images by slicing the 3D data and attributes to images and feeding them to a fusion model. We chose four different models pretrained for different image fusion tasks and tested the method. We used quantitative fusion metrics to evaluate our fusion method. The experimental results show that the DeepFuse model has good fusion results on seismic images; both quantitative metrics and visual representation of the fused images are better than the results of other models. Also, the experimental results show that models pretrained for multifocus image fusion are not suitable for the task of seismic image fusion. In comparison to previous done work, our results show that the increasing attribute number doesn't distort the image. The experiments showed that image fusion techniques are capable of combing three seismic attributes. In the future, we will conduct studies to increase the number. This is the first study conducted using models pretrained on other types of images for seismic image fusion and the results are promising.