Deep Learning Architectures and Techniques for Multi-organ Segmentation

—Deep learning architectures used for automatic multi-organ segmentation in the medical field have gained increased attention in the last years as the results and achievements outweighed the older techniques. Due to improvements in the computer hardware and the development of specialized network designs, deep learning segmentation presents exciting developments and opportunities also for future research. Therefore, we have compiled a review of the most interesting deep learning architectures applicable to medical multi-organ segmentation. We have summarized over 50 contributions, most of which are more recent than 3 years. The papers were grouped into three categories based on the architecture: “Convolutional Neural Networks” (CNNs), “Fully Convolutional Neural Networks” (FCNs) and hybrid architectures that combine more designs - including “Generative Adversarial Networks” (GANs) or “Recurrent Neural Networks” (RNNs). Afterwards we present the most used multi-organ datasets, and we finalize by making a general discussion of current shortcomings and future potential research paths.


I. INTRODUCTION
Medical imaging using Computed Tomography (CT), Magnetic Resonance (MR), ultrasound, X-ray, and so on, has become an essential part in detection, diagnosis, and treatment of diseases [1].
A new medicine branch, imaging and radiology was developed to train human experts that can interpret medical images and provide an accurate diagnosis. The training is challenging due to the complexity involved, but more importantly, the diagnosis process itself is a tedious and exhausting work that is further impacted by the large variations in pathology between different individuals. Therefore, the need for automated help grew larger as the medical imagining sector expanded, with use-cases like segmentation of medical images, delineating human organs or automated diagnosis being intensively studied using Deep Learning (DL) architectures.
Deep learning absorbs the feature engineering designed by human experts into a learning step [2]. Furthermore, deep learning needs only a set of training/testing data with minor pre-processing (if necessary), and then can extract the human body representations in an autonomous manner. Throughout different architectures, DL has demonstrated enormous potential in computer vision [3].
Multi-organ deep learning architectures could lend a helping hand in the field of radiation therapy, by the making the segmentation process faster and more robust [4]. Multiorgan segmentation also paves the way for automation processes that are generalized to the full body or to a large spectrum of diseases facilitating online adaptive radiotherapy and fulfilling medical image segmentation's goal to become autonomous in reaching an accurate diagnosis in any medical imaging environment.

A. Segmentation Applications in the Medical Field
 Radiotherapy in cancer treatment. In radiotherapy, the need exists to control the radiation exposure of the target and healthy organs, so segmentation of organs at risk (OARs) could provide an important help to physicians [5].
 Automation. OARs and other clinical structures in the human body are manually segmented by physicians from medical images, which is difficult, tedious and time consuming [4]. Automating the segmentation process could help tremendously even if it will be only as a pre-step in the diagnosis (used for initial selection of cases or pathologies).
 Finding ROIs. Automatically finding regions could help while preparing for medical procedures or in applying specific procedures on highlighted regions.
 Computer Aided Diagnosis (CADx). To achieve this, a correct delineation of body structures is needed in the pipeline of any CADx systems. Accurate automatic segmentation could be used in non-invasive diagnosis scenarios and could be even deployed online.
 Mass detection. Detecting the mass of organs has as prerequisites a correct segmentation of the organ and the neighbouring surfaces  Assistance in endoscopic procedures. Automatic segmentation provides help for physicians when executing endoscopic procedures and could be used also in the training phase of the human experts [6].

B. Summary of other Reviews in the same Knowledge Field
The deep learning knowledge base was described in papers written by Schmidhuber [2], LeCun et al. [3], Benuwa et al. [7] and Voulodimos et al. [8]. More recently, great articles were written by Serre et al. [9] and Alom et al. [10].
For a description of deep learning architectures specifically applied in the medical field, we would like to highlight works written by Litjens et al. [11], Shen et al. [1], Hesamian et al. [12], Zhou et al. [13], Ker et al. [14], Taghanaki et al. [15] and Lu et al. [16]. For details regarding GAN in medical image processing we have an article by Yi et al. [17] and for a review of unsupervised deep learning techniques we have a paper written by Raza et al. [18]. More recently, a comprehensive overview targeted towards multi-organ architectures was written by Lei et al. [4].

C. The Aim of this Study
This article discusses the most interesting deep learning architectures and techniques applicable to medical multi-organ segmentation. Targeted to DL-based medical image multiorgan segmentation, there are several objectives that we aimed to fulfil with this article:

D. Contents of the Survey
The paper summarizes over 50 contributions, most of which are more recent than 3 years.
In our process of data searching and gathering, we used several different sources which include arXiv, Google Scholar, PubMed, ISBI, MICCAI or SPIE Medical Imaging. Search keywords included medical segmentation, multi-organ, fully convolutional neural network, and other architectures related to deep learning. The final end-result contains at least 30 articles that describe architectures for single organ segmentation and over 60 articles that detail deep learning techniques for multiorgan delineation.
To make this survey as recent as possible, we have selected works that were mostly published after 2017, while still including older papers that had a big impact in the research field. The most recent date of publication was set to June 1st, 2020, which excluded papers newer than that date.
The bulk of the reviewed works are in Sections II, III and IV and are grouped into three categories -CNNs, FCNs, and hybridaccording to the architecture and which network design is most prominent. The hybrid category has also 3 subsections: GANs, RNNs and fully hybrid approaches. For each architecture classification we presented a small description of the methods and highlighted the most relevant works that were related to multi-organ segmentation. For each included paper we listed the reference, the human structures that were used in training and a summary of their important features and achievements. In Section V we present the most used multiorgan datasets correlated to the human structures that they target. We finalize with a conclusion regarding the future of the research in this subject.
II. ARCHITECTURES APPLICABLE TO MEDICAL MULTI-ORGAN SEGMENTATION BASED ON CNNS A CNN is a sub-genre of deep neural networks [12] that are based on fully connected layers. A layer is made up by more neurons, and each one of these is linked to every neuron from the subsequent layer. A CNN architecture applies a convolution in at least one of its layers. Except for the initial layer, which is linked to the medical image, the input of each layer represents the output of the subsequent layer. Each one of these can perform specific tasks like convolution, pooling, loss calculation while different architectures make use of these layers in differing techniques.
Considering the input image's proportions and the dimension of the convolutional kernels, CNNs can be grouped into three categories. In 2D architectures the medical image is sliced into several 2D images which are fed to the CNN. 2.5D architectures still use 2D kernels, but the network is fed with several patches that are cut from a 3D medical image along the three orthogonal axes. The final category boasts 3D kernels which can extract the full information from a 3D medical image. The major downside of 3D architectures is the computational and memory requirements which are considered large even using the most up-to-date hardware.
In Table I we present a list of papers that employ CNNs for segmentation in a multi-organ setup. Even though they do not result in a segmentation, papers that present object detection methods in multi-organ scenarios were included in this list. The reason is that they could be used as a pre-step to the actual segmentation by generating regions of interest used to improve the accuracy of the end-result.  [19] Brain, Breast and Cardiac The authors demonstrate that a 2.5D CNN can be trained in a multi-modality (MRI and CT) scenario to segment tissues three different human structures. The results were comparable as in using three different architectures for each segmentation task.
[20] Abdomen The authors proposed an architecture that segments several abdominal organs using a two-step approach.
Organ localization obtained via a multi-atlas technique followed by training a 3D CNN that classifies the voxels to the corresponding organ [20]. They also use thresholding as a pre-processing step. [21] Chest, cardiac, abdomen The authors trained a 2.5D CNN that identifies if target human structures are present in input images (CT) [21]. Bounding boxes can also be placed around the found targeted structures. [22] Brain, abdomen The authors propose several methods that can improve the segmentation accuracy: supervised or unsupervised image enhancement and a novel loss function [22]. [23] Thorax-abdomen This work presents a 2.5 CNN trained for localization of several human structures in CT images [23]. [24] Pelvic organs The authors propose a novel hierarchical dilated CNN. The novelty is that they propose a multi-scale architecture comprised of several modules working with different resolutions [24]. [25] Torso -17 organs The authors propose an architecture for organ localization and 3D bounding boxes generation [25]. [26] Head and neck The article proposes a multi-organ segmentation architecture that cascades three CNNs followed by majority voting [26]. [27] Head and body The authors propose an architecture for organ localization based on a 3D CNN that also improves the localization performance on small organs [27].
III. ARCHITECTURES APPLICABLE TO MEDICAL MULTI-ORGAN SEGMENTATION BASED ON FCNS CNNs can classify each individual voxel from a medical image, but this approach has a huge drawback. Because the neighbouring patches on which convolutions are calculated have overlapping voxels, the same calculations are done multiple times with performance penalties. To counter this major issue, Long et al. [28] proposed the "Fully convolutional network" where the size of the predicted image is increased to match the size of the input image by using a transposed convolution layer. Ronneberger et al. [29] proposed the U-Net network that has a contracting path with layers that include convolutions, max pooling and Rectified Linear Unit (RELU) [30] and an expanding path that involves up-convolutions and concatenations with high-resolution features from the contracting path [29]. Çiçek et al. [31] implemented the first 3D U-Net design while Milletari et al. [32] improved the U-Net architecture by adding residual blocks and a dice loss layer.
In Table II we present a list of papers that employ FCNs for segmentation in a multi-organ setup.  [33] Liver and heart The authors propose a 3D FCN enhanced by a deep supervision technique [33]. The architecture is validated against heart and liver datasets (not a full-blown multi-organ implementation).
[6] Abdomen The article proposed an approach on segmenting 4 abdominal organs using an FCN that employs "dilated convolution units with dense skip connections" [6].
[34] Abdomen In this article the authors prove that a "multi-class 3D FCN trained for seven abdominal structures can achieve competitive segmentation results, while avoiding the need training organ-specific models" [34]. They proposed an architecture comprised of two FCNs, with the first delineating a candidate region, while the later uses that as input for the final segmentation.
[35] Esophagus, Trachea, Heart, Aorta The authors propose "two collaborative architectures to jointly segment multiple organs" [35]. The first network will learn anatomical constraints employing also conditional random fields, while its output will be used by the second network for guiding and refining the segmentation.
[36] Liver, spleen, kidneys The authors propose a deep 3D FCN for organ segmentation that is enhanced using a "time-implicit multi-phase evolution method" [36]. [37] Torso and special regions: lumen and stomach content The authors propose a 2.5D FCN architecture trained on CTs. The algorithm uses a fusion method for the final 3D segmentation. They summarize the algorithm as "multiple 2D proposals followed by 3D integration" [37]. [38] Liver, Left kidney In this paper, the authors propose an improvement of their previous segmentation architectures by adding an organ localization module [38]. [39] Gastro-intestinal tract The authors present an implementation of a Dense V-Net architecture in a multi-organ setup while showing that their proposed "dense connections and the multi-scale structure" [39] produce better segmentation results.
[40] Abdomen The paper describes an implementation of a 3D U-Net for multi-organ CT segmentation [40]. The authors obtained a combined dice of 89.3% in testing 7 organs. www.ijacsa.thesai.org

Ref. Site Important features [41] Abdomen
The authors present an architecture that is based on a "multi-scale pyramid of stacked 3D FCNs" [41]. The results are obtained by taking the predictions of a lower-resolution 3D FCN up-sampling, cropping them and afterwards concatenating them with the inputs of a 3D FCN that utilizes a higher resolution which will generate a final segmentation. [

42] Abdomen
The authors argue that the results of multi-organ segmentation using FCNs depend on the architecture, but also are heavily influenced by the chosen loss function [42]. They also evaluate the loss function's influence in multi-organ segmentation scenarios.
[43] Abdomen The authors propose a cascaded approach that uses two 3D FCNs. The first architecture defines a candidate region, while the second focuses on the details and provides the final segmentation. The authors argue that their "approach reduces the number of voxels the second FCN must classify to ∼10%" [43].
[44] Torso The paper presents three 3D FCN architectures and surveys their results of multi-organ segmentation in the human torso. The dice scores average between 0.91 and 0.98 for 6 covered organs. [45] Head and neck The authors present an architecture based on a 3D U-Net that is tested against a head and neck dataset. The results were mixed, with fair segmentation scores for 7 organs out of 11, but with low results for the other organs.
[46] Brain, Abdomen The authors propose an FCN architecture [46] that outperforms the initial U-Net implementation in several segmentation tasks for brain or abdomen. The results have a dice percentage between 83.42% and 96.57% for several abdomen organs.
[47] Chest The authors propose an architecture based on two cascaded networks. [

48] Abdomen
The authors present a novel architecture that improves the segmentation using a transfer learning scheme. 3D U-Nets are used in a general approach or single organ approach with transfer learning between them. Furthermore, probabilistic atlases are used to estimate the location of the organs. [

49] Abdomen
The authors present an architecture for segmentation in a multi-organ scenario consisting of a "2D U-Net localization network and a 3D U-Net segmentation network" [49]. Compared to other architectures, the authors results are better for several organs like prostate and bladder.
[50] Abdomen The authors propose a two-step architecture. The first step contains 2D networks with reverse connections that detect features. These features are afterwards merged with the original image to "enhance the discriminative information for the target organs" [50] and are used as input for the final segmentation network.
[51] Gland The paper describes two Dense U-Nets used for segmentation of several gland types.
[52] Abdomen The authors propose a multi-organ segmentation architecture based on 3D convolution [52]. Their design obtained an average Dice score of 83.7% for 6 abdominal organs in their targeted dataset.
[53] Thorax The authors propose an architecture where a 3D U-Net localizes each target organ. Afterwards, cropped images with one organ serve as input to several individual 3D U-Net segmentation networks and as a final step the individual results are merged for a global segmentation result.
[54] Thorax and abdomen The paper describes in detail the SegTHOR [54] multi-organ dataset and present a segmentation framework based on U-Net.
[55] Thorax and abdomen The paper proposes an architecture for segmentation of the SegTHOR [54] multi-organ dataset that consists of two 3D V-Net working on different resolutions (one for organ localization and one for segmentation refinement). Their approach ranked first in the initial phase of the SEGTHOR challenge. [56] Thorax and abdomen The authors propose an improvement to the U-Net and obtain a "uniform U-like encoder-decoder segmentation architecture" [56]. The architecture ranked second on the initial phase of the SEGTHOIR challenge.
[57] Thorax and abdomen The authors propose a simplified version of the Dense V-net model with postprocessing that improve the organ segmentation results.
[58] Thorax The paper proposes a multi-organ segmentation architecture that contains "dilated convolutions and aggregated residual connections in a U-Net styled network" [58].
[59] Abdomen, torso The paper proposes a 3D U-Net like architecture [59] that is validated on 5 different organs.
[60] Abdomen The authors propose a multi-class segmentation architecture based on U-Net [60]. Their design has similar results to other approaches on 4 organs but with superior dice scores for the intestine.
[61] Abdomen The authors propose a 3D U-Net architecture tested in a multi-organ segmentation scenario.
[62] Abdomen The authors propose an architecture consisting of a 3D U-Net that is enhanced by graph-cut post-processing [62] tested in a multi-organ segmentation scenario.
[63] Abdomen The authors present a "pyramid-input pyramid-output" [63] architecture that can be trained in a multi-scale and partially labeled scenario. In order to discriminate the features in differing scales, they designed an "adaptive weighting layer to fuse the outputs in an automatic fashion" [63] www.ijacsa.thesai.org IV. ARCHITECTURES APPLICABLE TO MEDICAL MULTI-ORGAN SEGMENTATION BASED ON HYBRID METHODS As the DL field is expanding, new and exciting network architectures are developed. At the same time, the possibilities of improving the existing segmentation networks are shrinking. Therefore, to overcome these challenges, hybrid approaches are used more extensively. These hybrid methods involve using several network designs in the same architecture serving different functional purposes. We have divided the hybrid approaches into segmentation architectures enriched with GANs, enriched with RNNs and fully hybrid approaches.

A. Hybrid Methods Employing GANs
A GAN is a type of machine learning network designed by Goodfellow et al. [64]. These networks are taught to be able to generate new data that shares the same characteristics as a provided initial training set. In Table III we present a list of papers that propose GAN based hybrid multi-organ architectures.

B. Hybrid Methods Employing RNNs
A Recurrent Neural Network (RNN) is a type of machine learning network that generalizes the feedforward neural network architecture and has hidden states that act as an internal memory. Empowered with these connections the RNNs can memorize the patterns from previous inputs. These architectures are applied mostly to time series predictions or speech recognition. But because medical images are usually comprised of multiple adjacent slices with correlating information between them, RNNs can be employed in hybrid scenarios to improve the segmentation results. In Table IV we present a list of papers that propose RNN based hybrid multiorgan architectures.

Ref.
Site Important features [65] Brain, liver, cells The authors propose an architecture made by "a generative, a discriminative, and a refinement network" [65] based on U-Net. The final semantic segmentation masks are composed by the output of the three networks.
[66] Thorax The paper describes an architecture that trains a set of generator networks (based on U-Net) and a set of discriminators (based on FCNs). "The generator and discriminator compete against each other in an adversarial learning process to produce the optimal segmentation map of multiple organs" [66].
[67] Thorax The authors propose a hybrid architecture that first generates a "global localization map by minimizing a reconstruction error within an adversarial framework [67]. Afterwards, the localization map guides an FNC for multi-organ segmentation.
[68] Abdomen The authors present a hybrid architecture that combines GAN based image synthesis methods with a deep attention strategy that learns discriminative features for organ segmentation.
[69] Abdomen The paper describes a hybrid architecture that combines cascaded convolutional networks with adversarial networks to alleviate data scarcity limitations.  [70] Optic disc, cell nuclei, left atrium The authors present a hybrid architecture that combines a CNN with an RNN.
[71] Abdomensmall organs This paper presents an architecture in which a recurrent module "repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration" [71]. [72] Blood vessel, skin cancer, lungs The article presents a hybrid architecture based on U-Net and RNN where the "feature accumulation with recurrent residual convolutional layers" [72] provides better segmentation end results.
[73] Abdomen The paper proposes an attention gate model that can be integrated into neural networks." Models trained with AGs implicitly learn to suppress irrelevant regions in an input image" [73]. [74] Vertebrae, liver The authors present a hybrid architecture that consists of a U-Net-like network enhanced with bidirectional C-LSTM [74].

C. Fully Hybrid and Generic Segmentation Improvement Methods
In Table V we present hybrid methods that do not fit in any previous category and generic segmentation improvement methods in multi-organ scenarios.  [75] Torso and abdomen The authors present a sample selection method [75] that improves the training of neural networks. The method is tested in a multi-organ segmentation scenario.
[76] Abdomen The authors investigate the "effectiveness of learning from multiple modalities to improve the segmentation accuracy" [76].
[77] Abdomen The authors propose an architecture in which an initial model is trained on annotated data to generate pseudo labels that enrich the training data for a second model that will do a final segmentation.
[78] Abdomen The authors propose an architecture that incorporates anatomical domain knowledge on abdominal organ sizes to guide and improve the training process.
[79] Retina, lungs The paper describes an architecture that "embeds edge-attention representations to guide the process of segmentation" [79]. [80] Heart, gland, lymph node The authors propose an architecture that firstly decomposes the segmentation problem into several sub-problems, then applies DL modules onto each sub-problem and lastly integrates the results to obtain the final segmentation.
[81] Abdomen, heart, brain The authors present an architecture that tries "to integrate local features with their corresponding global dependencies" [81] by using a guided selfattention mechanism. www.ijacsa.thesai.org V. ARCHITECTURES APPLICABLE TO MEDICAL MULTI-ORGAN SEGMENTATION BASED ON HYBRID METHODS There are multiple collaborative initiatives with medical organizations to obtain better and larger datasets usable for organ segmentation. But despite all these efforts, the amount of annotated data that is at the disposal of DL scientists is still low. There are solutions in combining several datasets of parts of the human body, but different modalities or scales reduce considerably their usage in multi-organ segmentation scenarios. In Table VI we present several datasets that try to overcome these challenges and are usable in multi-organ validation of segmentation architectures.

VI. FINAL CONCLUSIONS
This paper is an overview of deep learning methods in medical multi-organ segmentation. Based on most of the surveyed works, FCNs are the most used architectures used to perform multi-organ automatic delineating. As the amount of research related to FCNs is huge, the possibilities to improve them is dwindling. So, more recently, hybrid methods, be it with the use of GANs, RNNs or completely new architectures are gaining much more attention. We speculate that in the future the number of available datasets will grow, so the usage of FCNs or hybrid networks will become more straightforward. Another un-charted territory is the usage of more intelligent semi-supervised methods, the usage of fully unsupervised methods or reinforcement learning.