Automated Detection of Driver and Passenger Without Seat Belt using YOLOv8

—The issue of traffic accident fatalities is a serious concern on a global scale, and one of the contributing factors is the failure of drivers to adhere to seat belt usage. A notable challenge arises from the limited availability of law enforcement personnel monitoring this particular issue. In this context, there is a compelling need to implement an automated detection system. The development of this system using YOLOv5 has been done. However, there are weaknesses related to the length of training and detection time. Therefore, this paper proposed a new system using the YOLOv8 method to detect drivers and passengers who violate seat belt regulations. The proposed system is divided into three subsystems: windshield detection, passenger classification, and seat belt classification. YOLOv8 is the latest version of the YOLO (You Only Look Once) method and has been proven to provide better performance than previous versions. Furthermore, this paper also compared five YOLOv8 models, namely YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. The proposed model is trained and tested using image data collected from several roads in Indonesia. The experiment results show that the YOLOv8s model produced the best mean Average Precision (mAP) of 0.960 for windshield detection. YOLOv8s-cls and YOLOv8l-cls models achieved the same accuracy of 0.8923 for passenger classification. The YOLOv8l-cls model produced the best accuracy of 0.8846 for seat belt classification. In addition, the proposed method can increase mAP and training time for windshield detection compared to YOLOv5.


INTRODUCTION
Traffic accident fatalities are pressing global issues, ranking among the top 10 causes of mortality in low-income countries [1].A key factor related to this issue is drivers' and passengers' non-adherence to seat belt usage.Even though the compulsory use of seat belts has been mandated, instances of non-compliance are alarmingly frequent.Using seat belts can significantly reduce the risk of severe injury or death in a traffic accident.Therefore, it is essential to raise public awareness among vehicle operators of the importance of wearing seat belts.Efforts are also needed to improve surveillance and enforcement of the regulations on the road.
In this context, monitoring of drivers and passengers in four-wheeled or larger vehicles is accomplished through direct observation.Several weaknesses have been reported in this approach since many violations remain undetected.This issue can be addressed by using Closed-Circuit Television (CCTV) cameras for surveillance on the road combined with automated detection based on computer vision.This automatic detection requires a method that produces high accuracy and speed.This application can help the police detect drivers and passengers not wearing seat belts.So their work becomes more efficient.
Research into computer vision-based detection of drivers' compliance with seat belt regulations commenced approximately a decade ago.Generally, this research can be divided into two types based on handcrafted and nonhandcrafted features.The use of handcrafted features was shown by Qin et al., who combined Haar-like and Histogram of Oriented Gradient (HOG) descriptors to report efficiency and robustness [2].Additionally, Elihos et al. used Fisher Vector [3].Most research used non-handcrafted features, such as BN-AlexNet [4], Convolutional Neural Network (CNN) [5], A Nimble Architecture for Driver and Seat Belt Detection through Convolutional Neural Networks (NADS-Net) [6], YOLOv3 [7], and YOLOv5 [8], [9].YOLOv5 achieved the best mean Average Precision (mAP) compared to other methods for detecting vehicle windshields, passengers, and drivers without seat belts [9].However, there are weaknesses related to training and detection time.In addition, this method incorrectly detects low-quality data [10].For this reason, a method is needed to increase detection accuracy and reduce detection time so that it can be applied in real-time.
The latest version of YOLO (You Only Look Once), YOLOv8, was more advanced than its predecessors.YOLOv8 produced high average accuracy.YOLOv8 scores much better than YOLOv5 [17].Therefore, this paper developed a new system using the YOLOv8 to detect drivers and passengers without seat belts.The proposed system was divided into three subsystems: windshield detection, passenger classification, and seat belt classification.The main contribution of this paper is that the proposed method can produce relatively high accuracy so that it can potentially be applied to actual conditions.The remainder of this paper is organized as follows: Section II is about related work.Section III describes the dataset, the proposed method, and the parameters used to www.ijacsa.thesai.orgevaluate the performance of the proposed method.Section IV explains the test results and analysis of the windshield detection, passenger classification, and seat belt classification subsystems.Additionally, this section compares the proposed method with previous research.Finally, Section V explains the conclusions and research that will be carried out.

II. RELATED WORK
Research on driver detection without seat belts based on computer vision started about a decade ago.Generally, this research can be divided into two types based on handcrafted and non-handcrafted features.Guo et al. used license plate and edge detection to determine the area of the driver's area and the position of the seat belt [18].Li et al. also used Canny edge detection, with the driver's area determined by cutting the left half of the windshield [19].Meanwhile, the windshield area's position was detected using the cascade Adaboost classifier.
Zhou et al. proposed Canny edge detection and salient gradient map for feature extraction and learning-based algorithms for binary classification [20].Yang et al. determined the driver's area using face and seat belt detection through connected area methods [21].Wu et al. introduced a methodology that includes ascertaining the driver's area through semantic segmentation, streamlined by a pruning process, and conducting classification using connected techniques [22].Yongquan et al. focused on reducing computation time for driver detection without seat belts by designing a Graphics Processing Unit (GPU) acceleration method [23].The driver's area was obtained using Squeeze-YOLO, while seat belt usage was determined using semantic segmentation algorithms and full convolution network pruning.Wang et al. used semantic segmentation but lightweight feature extraction and the Squeeze-YOLO algorithm to determine the driver's area [24].
In pursuit of enhanced accuracy, specific research endeavors have embraced a fusion of multiple descriptors.For instance, Qin et al. integrated Haar-like features and HOG [25].Madake et al. combined feature extraction techniques such as Canny, FAST (Features from Accelerated Segment Test), and BRIEF (Binary Robust Independent Elementary Features) [26].The test results showed that the proposed feature extraction combinations improved accuracy.
In recent years, most research has used CNN.Furthermore, Sajja et al. used this method and compared the concept to Support Vector Machine (SVM) [27].Test results showed that the CNN method achieved better accuracy than SVM.Kapdi et al. adopted the MobileNetV2 model [28], which had the advantage of robustness in different weather conditions.Meanwhile, Chen et al. combined CNN and SVM for feature extraction and classification.This method was based on multiscale feature extraction and applied to images with complex road backgrounds.In this context, the proposed method achieved better average detection than the Adaboost algorithm and CNN [5].Kannadaguli proposed a detection method using Fully Connected One Shot (FCOS) and added prediction elimination with Non-Maximum Suppression (NMS) [29].Additionally, Elihos et al. compared the Single Shot MultiBox Object Detector (SSD), VGG16 model, shallower CNN model, and Fisher vector model [3].The test results showed that the SSD model achieved the highest accuracy.Some versions of YOLO have also been used in this field.Luo et al. used YOLOv3 to determine the driver's area and CNN for the classification process [30].The testing showed that the proposed method achieved high accuracy and robustness in complex environments.Wang and Ma also used YOLOv3 and a lightweight network structure [7].In this context, increasing the number of lightweight templates improved accuracy but reduced speed.Furthermore, Khalid and Hazela used YOLOv4 to determine the driver's area and the AlexNet model for classification [31].Feng et al. proposed YOLOv5 to locate the driver's area and used the AlexNet deep convolutional network for classification [8].This method saved memory and computational time compared to SVM.Hosseini and Fathi used YOLOv5 to detect car windshields and the ResNet32 model for classification [9].
Maduri et al. proposed a method for real-time driver detection without seat belts [32].This method used deep learning and was embedded in a Raspberry Pi.Upadhyay et al. also focused on real-time conditions and implemented the concept in real-world scenarios to install a camera in a car cabin using YOLOv5 [33].Zang et al. also proposed the SlimSSDMV2 and Line Segment Detector (LSD) models, which were applied to mobile devices [34].

A. Dataset
This research used three datasets, namely Dataset1, Dataset2, and Dataset3.The datasets were captured from video frame recordings on several roads, using CCTV and cameras in Bandung and Semarang City, Indonesia.Fig. 1 shows samples of images from Dataset1.This dataset contains video frame images with annotations of windshields used in the windshield detection subsystem.Fig. 2 shows samples of images from Dataset2 used in the passenger classification.This dataset consists of a set of seating area images comprising two classes, namely 488 passenger images and 484 no-passenger images.These images are obtained from the right half of the windshield image.Fig. 3 shows samples of images from Dataset3 used in the seat belt classification.This dataset contains images of drivers and passengers with and without seat belts.The number of www.ijacsa.thesai.orgimages of drivers or passengers wearing seat belts is 957, and the number of images of drivers or passengers not wearing seat belts is 727.Whereas Table I shows the number of training and testing data in each dataset.The distribution of training and testing data is carried out randomly, with a ratio of 80% for training data and 20% for testing data.

B. Proposed System
This research built a system for detecting drivers or passengers without seat belts.The proposed system generally consists of three subsystems: windshield detection, passenger classification, and seat belt classification.Fig. 4 displays this proposed system.

1) Windshield detection:
A car's driver and front passengers are visible through the car's windshield.Therefore, the initial step includes detecting each car's windshield position using the YOLOv8 method.YOLO predicts bounding boxes (Bbox) and class probabilities (Cls) through a single network.Furthermore, it excels in high accuracy even when using small model sizes and can be trained on a single GPU [17].The method is cost-effective for machine learning practitioners with limited hardware resources or cloud computing.Ultralytics developed YOLOv8 as the latest model for object detection, image classification, and image segmentation.This method is an anchor-free model that directly predicts the object's center and augments images during online training.The model analyzes variations of the provided images in each epoch through mosaic augmentation.This augmentation has empirically been found to decrease performance when applied throughout the entire training routine.Fig. 5 shows the architecture of the YOLOv8 model divided into two main parts: the backbone and the head [36].The backbone architecture consists of 53 convolutional layers and uses partial cross-stage connections to enhance information flow between different layers, promoting better feature representation and extraction based on CSPDarknet53.Meanwhile, the head of YOLOv8 consists of several convolutional and fully connected layers.These layers are used for object detection, including bounding box prediction, objectivity scores prediction, and class probabilities for image objects.The model uses a feature pyramid network to detect large and small objects accurately.
2) Passenger classification: The windshield detection subsystem results in the windshield area.This area has two seats: the driver (left) and the passenger (right).The driver's seat always has a driver, but the passenger's seat may or may not have a passenger.Therefore, the passenger classification subsystem classifies the presence of a passenger in the right seat.This subsystem comprises windshield cropping, division of the driver and passenger areas, and the classification process.The windshield cropping process is used to capture the windshield area of each car.Division of driver and passenger areas separates the area into the left and right sides for the driver and passenger areas.The classification process is conducted in two classes (seat with passenger and seat without passenger) using the YOLOv8 method.
3) Seat belt classification: The final subsystem is the seat belt classification, which determines whether the driver and passenger are wearing a seat belt.We used YOLOv8 method.

C. Performance Evaluation
Performance evaluation for windshield detection used the parameters of precision (P), recall (R), and mAP.Precision and recall are calculated using as in Eq. ( 1) and Eq. ( 2), respectively [37].TP dec represents correct detections of the ground truth bounding box, FP dec is the false detections of objects, and FN dec represents the undetected ground truth.

∑  
AP i is the average precision for each class, and N is the number of classes.AP for each class is calculated as in Eq. ( 4).
is the precision measured at the time of recall ̃.
The performance evaluation for passenger and seat belt classification used an accuracy parameter calculated as in Eq. ( 6).Table II displays   

IV. EXPERIMENTS AND RESULTS
The experiments were conducted on each subsystem using different datasets.The windshield detection, passenger classification, and seat belt classification subsystems used Dataset1, Dataset2, and Dataset3, respectively.The experiments were performed using YOLOv8, and the model was trained and tested on a Tesla T4 GPU.

A. Windshield Detection
The first experiment was conducted for windshield detection using five YOLOv8 models, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x.All experiments used 640×640 pixels for input size, 16 for batch size, 100 for epochs, and Adam for optimizer.Table III shows the experiment result of windshield detection.The best precision and recall were achieved using the YOLOv8s and YOLOv8n models, respectively, while the fastest training time was obtained with YOLOv8s.Additionally, the best mAP was achieved using the YOLOv8m model at 0.960.Fig. 6, Fig. 7, and Fig. 8 display the precision-recall, precision-confidence, and recall-confidence curves of this experiment, respectively.Meanwhile, Fig. 9 displays examples of windshield detection results.The proposed method was compared to the previous research, as shown in Table IV.Here, the study in [9] used the YOLOv5s model with 48 for batch size, 0.0001 for learning rate, and Adam for the optimizer.The table shows that the proposed method outperformed precision, recall, and mAP.The proposed method increases mAP by 2% and decreases training time by 0.54 times compared to the YOLOv5s model.Therefore, the proposed method is more suitable for real-time windshield detection.V. CONCLUSION This research developed a system for detecting drivers and passengers who violate seat belt regulations.The proposed system consists of three subsystems: windshield detection, passenger classification, and seat belt classification.The proposed methods in each subsystem used YOLOv8.The experiment was conducted by comparing five models.Furthermore, three batch sizes were compared in passenger and seat belt classification.The results show that the proposed method achieved an mAP of 0.960 for windshield detection.An accuracy of 0.8923 and 0.8846 was reported for passenger and seat belt classifications.Moreover, the proposed method excelled in accuracy and training time compared to YOLOv5 for windshield detection.Therefore, the proposed method is more suitable for detecting drivers or passengers who violate seat belts in real-time.A license plate detection subsystem could be adopted for future research to identify vehicles.
Some researchers aimed to improve accuracy and speed with the development of the CNN model.Zhou et al. developed the Alexnet model by adding Batch Normalization (BN), known as BN-Alexnet [4].The proposed method increased the average accurate detection and reduced training time compared to Alexnet, VGGNet-16, and GoogLeNet.Chun et al. developed the CNN model known as NADS-Net using the feature pyramid network (FPN) backbone and multiple detection heads method[6].Additionally, Yang et al. proposed a method focused on the Central Processing Unit (CPU) and real-time implementation.The proposed method combined traditional operators (texture extraction), SSD MobileNet V2, and particle filter tracking algorithms[35].

Fig. 9 .
Fig. 9. Example of windshield detection from video frame images.

TABLE I .
NUMBER OF TRAINING AND TESTING DATA

TABLE III .
EXPERIMENT RESULT OF WINDSHIELD DETECTION

TABLE IV .
COMPARISON BETWEEN THE PROPOSED METHOD AND PREVIOUS RESEARCH METHOD (YOLOV5) Adam for optimizer, and 20 for early stopping.TableVshows the results of this experiment.The highest accuracy reaches 0.8923 using YOLOv8s-cls and YOLOv8l-cls.Table VI is a confusion matrix of the best result of passenger classification.Furthermore, the fastest epochs and training time are achieved using the YOLOv8n-cls model with 16 for the batch size.

TABLE V .
EXPERIMENT RESULT OF PASSENGER CLASSIFICATION The seat belt classification experiment used five models and three batch sizes, the same as in the passenger classification experiment.Table VII shows the results of this experiment.In this context, the highest accuracy reaches 0.8846 using the YOLOv8l-cls model and 16 for batch size.Table VIII is a confusion matrix of the best experiment result.The fastest training time reaches 0.126 hours using the YOLOv8n-cls model and 16 for batch size.

TABLE VII .
EXPERIMENT RESULT OF SEAT BELT CLASSIFICATION