Enhancing Alzheimer's Disease Diagnosis: The Efficacy of the YOLO Algorithm Model

—The diagnosis and early detection of Alzheimer's Disease (AD) and other forms of dementia have become increasingly crucial as our aging population grows. In recent years, deep learning, particularly the You Only Look Once (YOLO) architecture, has emerged as a promising tool in the field of neuroimaging and machine learning for AD diagnosis. This comprehensive review investigates the recent advances in the application of YOLO for AD diagnosis and classification. We scrutinized five research papers that have explored the potential of YOLO, delving into the methodologies, datasets, and results presented. Our review reveals the remarkable strides made in AD diagnosis using YOLO, while also highlighting challenges, such as data scarcity and research lacking. The paper provides insights into the growing role of YOLO in the early detection of AD and its potential to transform clinical practices in the field. This review aims to inspire further research and innovation to enhance AD diagnosis and, ultimately, patient care.


I. INTRODUCTION
There is a surging interest in the application of Artificial Intelligence (AI) within the realm of healthcare.Health carerelated AI research has seen a rapid acceleration in publication growth since 2012, with a 45.1% increase in the past five years, driven by technological breakthroughs and is expected to continue doubling approximately every two years based on this growth trend [1].AI has solidified its position as a transformative power in the healthcare sector, completely reshaping the approaches to diagnosis, treatment, and medical condition management.In recent years, AI has emerged as an indispensable asset in the healthcare industry, offering groundbreaking solutions to some of the most formidable challenges in medicine, particularly when addressing neurological diseases.Neurological diseases, encompassing a diverse spectrum of conditions, such as Alzheimer's disease (AD), stroke and Parkinson's disease, pose intricate challenges in terms of diagnosis and treatment [2,3].AI has decisively altered the landscape in this context.AI applications in the realm of neurological diseases are both diverse and promising.AI, particular machine learning (ML) and deep learning (DL) architecture have the capability to scrutinize extensive volumes of brain imaging data, encompassing magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT) scans, in order to unearth subtle anomalies that might elude human perception [4,5,6].In contrast to conventional diagnostic and treatment methodologies, these AI-driven approaches address several limitations inherent in traditional methods, such as subjectivity, delayed diagnoses often resulting from inconspicuous early-stage symptoms, or findings imperceptible to human observers.This proficiency in early detection of neurological disorders offers the potential for swifter and more precise diagnoses.
In particular, the deep learning object detection algorithm known as You Only Look Once (YOLO) shows great promise in enhancing the accuracy, efficiency, and automation of diagnosing neurological diseases, with a special emphasis on Alzheimer's disease.The primary aim of this brief review is to investigate the present applications of YOLO in the classification of neurological diseases with a particular focus on Alzheimer's disease.Additionally, we will delve into the methods used and the challenges faced when applying AI to the diagnosis and treatment of neurological diseases.

A. Artificial Intelligence in AD Diagnosis
AD is a formidable and complex neurological condition that has captured the attention of scientists, healthcare professionals, and society at large.Named after Dr. Alois Alzheimer, who first described the disease in the early 20th century [7], Alzheimer's is a progressive and degenerative brain disorder that predominantly affects memory, cognitive function, and daily life activities.The impact of AD extends far beyond the affected individuals themselves, as it profoundly affects their families and caregivers, often placing an immense emotional and practical burden on them.It is the most common cause of dementia, a term that encompasses a range of cognitive impairments that interfere with an individual's ability to think, reason, remember, and communicate.AD is a devastating and relentless neurological disorder that presents a profound challenge to both the medical community and society as a whole [8].It is estimated that over 50 million people worldwide are currently affected by AD [9].As the global population ages, this number is projected to escalate significantly in the coming decades.This ailment has grown into one of the most prevalent and impactful health concerns of our time [10].
As is the case with numerous other neurological disorders [11], early diagnosis holds a crucial position in the care and strategic planning for Alzheimer's disease (AD).The classification of AD is based on different levels, which include Alzheimer's disease (AD), mild cognitive impairment (MCI), and cognitively normal (CN).Early identification in MCI level empowers individuals and their families to take proactive steps in addressing critical aspects of their future, encompassing healthcare preferences, support requirements, and financial and *Corresponding Author.www.ijacsa.thesai.orglegal considerations [12,13].Additionally, early detection allows for proactive safety measures to reduce the risk of wandering or disorientation-related incidents.Moreover, it opens up the possibility of participating in clinical trials for innovative treatments during the disease's early stages, contributing to advancements in research.
Despite recent advancements in clinical trials related to Alzheimer's disease, several challenges have emerged.These challenges include the difficulty of distinguishing AD from normal age-related cognitive changes, limited access to specialized diagnostic tools in certain geographic regions, and the growing number of individuals affected by the disease [14].Consequently, the role of computer applications in AD diagnosis has become increasingly crucial.Among these, deep learning, which falls under the umbrella of machine learning and constitutes a pivotal element of artificial intelligence, has showcased impressive accomplishments in fields like object recognition and computer vision [15].This has led to the extensive integration of deep learning in the realm of neuroimaging analysis, where its neural network architecture, featuring non-linear activation functions, plays a pivotal role in tasks like image classification [16], particularly in the domain of neuroimaging and AD neuroimaging [17].This encompasses various modalities, including MRI, PET, CT, fMRI, and more [18].

B. Advanced in Machine Learning in Neuroimaging
Brain imaging can be categorized into distinct types based on various criteria.One such classification pertains to imaging modality, which can be categorized into structural and functional imaging.Structural imaging, exemplified by MRI, offers high-resolution images that unveil detailed brain anatomy, encompassing gray and white matter, as well as cerebrospinal fluid.It detects changes in brain volume and atrophy patterns, key indicators of Alzheimer's disease.While primarily used for functional studies, fMRI can also provide insights into structural connectivity through techniques like resting-state functional connectivity.Alterations in functional connectivity can be associated with structural changes in AD.In recent times, deep learning architectures have demonstrated the capability to handle complete 3D brain images seamlessly from start to finish (end-to-end) [19,20,21].However, the foremost challenge is the high computational cost, which demands substantial processing power and can result in extended training times.Overfitting is another issue of concern, as is the need for ensuring model interpretability.Data preprocessing is a critical stage in preparing both 2D and intricate 3D data, albeit with the introduction of added complexities.
In more detail, data preprocessing is a fundamental process in the preparation of raw data for machine learning algorithms.Its significance stems from the fact that real-world data can be noisy, incomplete, or poorly formatted.By cleaning and structuring the data, data preprocessing significantly enhances the accuracy and effectiveness of machine learning models.Within the domain of neuroimaging analysis, the pivotal stages of data preprocessing and feature extraction hold an indispensable role.These critical components serve to enhance data quality, mitigate noise, establish data consistency, augment statistical power, facilitate data interpretation, and enhance research precision.Nevertheless, it is essential to recognize that data preprocessing may also introduce certain inherent limitations that warrant consideration in the research process.

C. Limitation of Deep Learning in Alzheimer's Disease Diagnosis
The increasing importance of deep learning in Alzheimer's Disease (AD) classification has become increasingly apparent, resulting in a notable upswing in research endeavors from 2017 onward [17].These investigations have yielded a spectrum of reported accuracy levels, spanning from 70% to 99% [22].Notably, Sarraf et al. ( 2016) achieved outstanding accuracy rates of 98.84% for MRI [23] and an impressive 99.99% for fMRI [24] pipelines, while Suk et al. (2013) [25] attained an accuracy of 98.8%.However, a common reliance on diverse MRI pre-processing techniques to attain optimal results and a predominant focus on Convolutional Neural Networks (CNN) have contributed to a distinct research gap in the domain of deep learning for object detection.Consequently, there exists a pressing need to explore new research avenues that minimize the dependence on these pre-processing techniques.

D. Advancement of YOLO for Alzheimer's Disease Diagnosis
The diligent efforts of numerous researchers have been dedicated to the deployment of deep learning models for object detection within the realm of medical imaging, particularly within the domain of Alzheimer's Disease diagnosis.This dedication has culminated in the emergence of the YOLO model and its various iterations, representing significant milestones in the development of this innovative approach.

E. Convolutional Neural Networks
A key technique within the domain of deep learning is the Convolutional Neural Network (CNN) [26].These networks take inspiration from the human system and are designed to conduct hierarchical learning using sophisticated algorithms.This process involves the modeling of features at various levels, allowing the extraction of abstract representations from the input data.CNNs are constructed with multiple layers, including convolutional, activation, and pooling layers.To produce final output predictions, one or more Fully-Connected layers (FC) are added to the network.Ang et al. (2017) illustrated the architecture of a CNN using a diagram (see Fig. 1).www.ijacsa.thesai.orgVarious notable variations in the field of deep learning have been developed, with some well-known models leading the way.These models include LeNet [28], AlexNet [29], ResNet [30], and GoogLeNet [31].Moreover, these models can be categorized into two main types: one-stage architectures and two-stage architectures.In a two-stage CNN, such as the Faster R-CNN (Region-based Convolutional Neural Network) [32], the object detection process is divided into two distinct steps: region proposal and classification.Initially, the model generates region proposals, which are essentially candidate regions within an image where objects might be situated.Once these region proposals are generated, each one is passed through a classifier to determine if it contains an object and, if so, to identify the class of the object.On the other hand, onestage CNNs are designed for a more streamlined approach, where object detection occurs in a single step, without the need for a separate region proposal stage.These models directly predict bounding boxes and class labels for objects within an image, making them efficient and suitable for real-time object detection.However, it's worth noting that they may not always achieve the same level of accuracy as two-stage models in certain situations.Examples of one-stage CNNs include YOLO and the Single Shot MultiBox Detector (SSD).

F. LeNet Architecture
LeNet, a condensed form of "LeNet-5," represents an architectural framework introduced by LeCun et al. in 1998 [28] as depicted in Fig. 2.This landmark innovation has played an integral role in shaping the landscape of deep learning and CNN.It was one of the first successful applications of neural networks for computer vision tasks particular in handwritten digit recognition, specifically for recognizing digits in postal codes and zip codes.
LeNet's structure is distinctly organized into two core components: the Convolutional Part and the Fully-Connected Part.Within the Convolutional Part, three vital layer types are evident: an Input Layer designed to handle 32x32 grayscale images (though adaptability is included for zero-padding, as seen in datasets like MNIST), two Convolutional Layers (CL) employing 5x5 filters, and two Max-Pooling Layers tasked with efficient feature map downsampling.Meanwhile, the FullyConnected Part incorporates three FC, also known as Dense layers, responsible for capturing intricate data relationships, concluding with an Output Layer featuring a softmax function to categorize handwritten digits, as exemplified in the MNIST dataset, which consisted of images of numbers from 0-9 in black and white.Nevertheless, it was primarily designed for the specific task of recognizing handwritten digits, limiting its applicability to a broader range of image classification tasks.

G. AlexNet Architecture
In 2012, Krizhevesky et al. [29] introduced AlexNet, a pioneering convolutional neural network (CNN) that revolutionized deep learning.This innovation significantly enhanced the depth of CNNs and incorporated effective parameter optimization strategies, marking a breakthrough in the prestigious ImageNet Large Scale Visual Recognition Challenge (ILSVRC).AlexNet's remarkable achievement was evident in its top-5 error rate of just 15.3%, outperforming traditional computer vision methods and setting a new standard at the time.The concept of AlexNet is illustrated in Fig. 3.
AlexNet marked a significant milestone in the realm of deep convolutional neural networks by pioneering the training of complex models on an extensive dataset, comprising more than 15 million images and involving millions of model parameters.This achievement underscored the capacity of deep networks to extract intricate features from massive datasets.Moreover, AlexNet popularizedthe adoption of Rectified Linear Units (ReLU) [33] as an activation function, www.ijacsa.thesai.orgFig. 2. The concept of LeNet [28].Fig. 3.The concept of AlexNet [29].
which not only improved computational efficiency but also expedited training convergence.Furthermore, to combat overfitting, a key concern in deep learning, the technique of dropout was introduced.This involved randomly setting 50% of the hidden neuron outputs to zero during training, effectively excluding them from the backpropagation process.These innovations not only contributed to AlexNet's success but also inspired the design of subsequent modern architectures.

H. GoogLeNet Architecture
In the 2014 ILSVRC, GoogLeNet, also known as Inception-V1, achieved first place [31] (Figure 4).A significant innovation of GoogLeNet lies in its use of inception modules, which are tailored to capture features at multiple spatial scales.These modules employ convolutional filters of different sizes, including 5x5, 3x3, and 1x1, to effectively integrate channel and spatial information across a range of spatial resolutions, enabling the network to extract features at both fine and coarse levels simultaneously.This design enhances feature learning efficiency.
Additionally, GoogLeNet incorporates 1x1 convolutions, which have the effect of reducing the dimensionality of feature maps, resulting in a computationally efficient architecture.This not only permits the construction of deeper networks but also significantly reduces the number of parameters to 5 million, as compared to AlexNet's 61 million.These designs make GoogLeNet well-suited for real-time and resource-efficient applications.However, GoogLeNet's limitations include its complexity, resource-intensive training, and reduced suitability for tasks beyond image classification.

I. ResNet Architecture
ResNet, introduced by He et al. [30], made a significant breakthrough in deep learning by winning the ILSVRC 2015 competition with a remarkably deep architecture of 152 layers, over 20 times deeper than AlexNet.The core challenge that ResNet addresses is the training of such deep neural networks, which previously suffered from issues like vanishing gradients and a decline in accuracy with increased depth.In order to overcome these challenges, ResNet introduces a groundbreaking concept known as residual connections, commonly denoted as skip connections.These ingenious connections serve to ease the training of exceptionally deep networks by promoting the efficient flow of gradients during the training process.Each residual block in a ResNet contains a www.ijacsa.thesai.org"shortcut connection" that bypasses one or more layers, enabling the network to learn residual functions.Essentially, this results in a combination of a traditional feedforward network and a residual connection.These residual functions capture the difference between the desired output and the current layer's output, making it easier for the network to learn identity mappings.ResNet models are available in various depths, including ResNet-50, ResNet-101, ResNet-152, which are widely adopted for image classification tasks.

J. Faster R-CNN
Ren et al. [32] proposed Faster R-CNN algorithm, with the idea of introducing the idea of integrating region proposal generation within a deep neural network.Faster R-CNN introduces the RPN, also known as region proposal network, a neural network module designed to generate region proposals directly from the input image.This replaces the need for external algorithms like selective search or edge boxes.
The RPN take an image from any size and suggests candidate object bounding boxes based on learned features from the image.The RPN employs anchor boxes, which are pre-defined bounding box shapes at various scales and aspect ratios.These anchor boxes are used to propose object regions efficiently.Faster R-CNN uses a two-stage detection approach.In the initial stage, the Region Proposal Network (RPN) is responsible for generating region proposals.Subsequently, the second stage entails the involvement of another CNN, known as Fast R-CNN [34], which carries out object detection and precise bounding box regression based on the generated region proposals.

K. YOLO Architechture
The primary innovation in Faster R-CNN lies in its Region Proposal Network (RPN), which generates high-quality region proposals directly within the network.This advancement results in faster inference times while upholding the required accuracy for object detection tasks.However, Faster R-CNN's two-stage architecture introduces a complex pipeline, demanding precise tuning of each stage independently, resulting in a system with significant computational overhead.
In an attempt to simplify the process and make it more efficient, YOLO (see Fig. 5), created by Redmon and his team [35], takes a unique approach.YOLO partitions the input image into a grid cells, grid cell is tasked with object detection if the object's center is located within it.These grid cells make predictions for B bounding boxes, complete with confidence scores and C class probabilities.These predictions are organized as a tensor with dimensions .Within this framework, the input image is effectively partitioned into sub-images, where 'five' signifies the detection of attributes like height, width, confidence score, and central coordinates for each bounding box.
Moreover, YOLO consolidates the various aspects of object detection into a unified neural network, utilizing information from the entire image to make predictions for each bounding box.This integration enables YOLO to simultaneously forecast bounding boxes for all categories within a given image.YOLO's architecture offers the advantages of end-to-end training and real-time processing speed, all while upholding a high level of precision in object detection.Taking cues from the architectural advancements of GoogLeNet, YOLO is structured with a series of 24 CL, supplemented by 2 FC layers.In contrast to GoogLeNet's inception modules, YOLO follows a more straightforward approach, integrating 1×1 reduction layers followed by 3×3 CL.Additionally, YOLO exhibits certain similarities with R-CNN, particularly Faster R-CNN, where each grid cell generates potential bounding boxes and assigns scores to them.Subsequently, a Non-Maximum Suppression (NMS) mechanism is employed to eliminate redundant or overlapping bounding boxes after predictions are computed across all grid cells using convolutional features.
Since its initial introduction in 2016, YOLO has undergone a series of evolutionary iterations, adapting to the specific requirements of diverse fields within human life.Each subsequent version of YOLO has been meticulously refined to meet the ever-evolving challenges and demands of real-time object detection and various computer vision applications.IV.DISCUSSION Uddin et al. [40] conducted a comparative analysis of three distinct deep learning architectures, namely YOLOv4, AlexNet, and Faster R-CNN.Their research encompassed a substantial dataset comprising 6400 MRI images, making it the largest dataset among the studies reviewed.However, a notable aspect of their dataset was the relatively limited number of CN (cognitively normal) images, which stood at 2560 training images.This dataset composition, characterized by an abundance of CN images and a scarcity of AD (Alzheimer's disease) and MCI (mild cognitive impairment) images, raised concerns about the potential for overfitting.The resulting models exhibited a propensity to classify most images as CN due to the skewed distribution of classes.This highlights the need for improved dataset balance, including a more representative inclusion of AD and MCI images.Addressing this class imbalance could lead to more reliable and accurate classification results, reducing the risk of overfitting and enhancing the model's overall performance.
In a study conducted by Alon et al. [36], the YOLOv3 architecture demonstrated an accuracy rate of 80%, which was notably the lowest among the studies under review.It's important to highlight that this study employed a significant dataset comprising 1000 MRI images for training and validation, achieving impressive results with training accuracy reaching 98.617%, validation accuracy at 98.8207%, and a mean average precision (mAP) of 96.17%.However, it's crucial to consider certain factors that might impact the reliability and generalizability of these findings.One notable concern is the study's reliance on a relatively small subset of only 20 MRI images for testing.The limited size of the testing dataset introduces an element of uncertainty into the model's performance, as it may not fully capture the intricacies and variations present in a more extensive dataset.Additionally, the absence of information regarding any pre-processing procedures applied to the dataset raises questions about the data's quality and its readiness for deep learning analysis.To enhance the credibility of these findings and ensure their generalizability, it is advisable to conduct further evaluations on larger and more diverse datasets.This would not only provide a more comprehensive assessment of the model's robustness but also validate its performance across a broader range of MRI images.
In a concurrent research effort, Islam et al. [37] undertook a comprehensive investigation into the use of various YOLO versions for image classification.Their study aimed to evaluate the performance of different YOLO iterations in the context of object recognition.Comparatively, the findings revealed that YOLOv3 and YOLOv4 outperformed YOLOv5.This difference in performance was attributed to the adaptable Darknet3 backbone, a crucial component of YOLOv3 and YOLOv4, which excels in the task of object detection.The Darknet3 backbone's architecture and capabilities enhanced the accuracy and efficiency of these YOLO versions.A noteworthy advancement came in the form of YOLOv6 and YOLOv7, which surpassed the capabilities of YOLOv4.This improvement was achieved by passing the input through multiple (CNN) layers in the backbone, resulting in increased computational efficiency and better overall performance.However, it's important to note that these models primarily focused on single-class detection, which may limit their applicability in scenarios where multi-class detection is required.The detailed results are presented on Table II.[38] (see Table III) embarked on an extensive investigation aimed at streamlining the preprocessing stage in the context of medical image analysis.They achieved this by implementing YOLOv3 and employing a dataset consisting of Abd-Aljabar et al. [39] also utilized YOLOv2 with a dataset of 300 raw MRI images, achieving a result of 98% accuracy, which is slightly lower than Fong et al.'s research at 99.8%.Nevertheless, this outcome reaffirms the effectiveness of YOLO variations in handling raw and unprocessed MRI images, offering an alternative approach to streamline the pre-processing stage in medical image analysis.These findings collectively emphasize the adaptability and robustness of YOLO-based models in handling diverse image data without the need for extensive pre-processing, potentially simplifying the workflow for neuroimaging analysis individual predictions.
In summary, YOLO has proven to be a promising tool for tasks related to Alzheimer's disease diagnosis and classification.However, it's crucial to acknowledge the persistent challenges that hamper progress in the field of neuroimaging research.These challenges encompass the scarcity of available data, a pronounced imbalance in class distribution within datasets, and a noticeable research gap.Addressing these issues through further data collection, careful dataset curation, and expanded research efforts is essential to fully unlock the potential of YOLO and other deep learning approaches in the critical domain of neuroimaging research.

V. CONCLUSION
In conclusion, our review provides a comprehensive exploration of the evolving landscape in the application of the You Only Look Once (YOLO) architecture for the diagnosis of AD.In a world where an aging population underscores the critical need for early and accurate AD detection, deep learning methods have emerged as a promising solution.YOLO, with its lightweight design, rapid processing, and impressive accuracy, showcases immense potential for reshaping the landscape of neuroimaging in AD classification.As we look ahead, further research in YOLO and deep learning is strongly encouraged.Moreover, techniques like explainable AI (X-AI) could be applied, or specific architectures based on or inspired by YOLO could be developed.This continued exploration promises to advance the quality of care for individuals afflicted by AD and various neurodegenerative disease.

Fig. 5 .
Fig.5.The concept of YOLO[34].III.RESULTSWhile YOLO's recent trends have leaned towards real-time applications, its potential in the medical imaging field, particularly for diagnosing Alzheimer's disease (AD), has drawn significant interest.Originally developed for object recognition, the adoption of YOLO in AD diagnosis has shown promise.Nevertheless, the need for further research, as highlighted in TableI, emphasizes the importance of ongoing investigations to advance AD diagnosis and treatment.

TABLE I .
SUMMARY OF ALZHEIMER'S DISEASE DIAGNOSIS STUDIES