Modeling the Estimation Errors of Visual-based Systems Developed for Vehicle Speed Measurement

—This paper aims to modeling the relationship between the error of visual-based systems developed for vehicle speed estimation (as dependent variable) and each of the detection region length, the camera angle, and the volume-to-capacity ratio (V/C), as independent variables. Simulation software (VISSIM) is used to generate a set of video clips of predefined traffic based on different values of the dependent variables. These videos are analyzed with a video-based detection and tracking model (VBDATM) developed in 2015. Errors are expressed as differences between each of the actual speeds generated by VISSIM and the speeds computed by the VBDATM divided by the actual speed. The results conducted by the forward stepwise regression analysis show that the V/C ratio does not affect the accuracy of the estimate and there are weak relationships between the estimation error and each of camera position and the detection region length.


I. INTRODUCTION
Intelligent Transport Systems (ITS) is a most important field of research for traffic planners in recent years. ITS have been developed to improve traffic conditions, detection of anomalous traffic, and secure safe operation of transportation [1]. As a fundamental task in ITS, vehicle detection and tracking aims to provide traffic management centers with necessary information such as traffic volume, traffic speed, etc. Of all parameters, speed is often a critical element in both macroscopic and microscopic traffic analysis [2]. However, the current systems of vehicle speed measurement include induction loop detectors, magnetic strips, laser sensors, ultrasonic technology, video-based technology, and so on [3]. Compared with other technologies, video-based system has a lot of advantages; it is easy to install, operate and maintain, and low cost. Moreover, it has a wide monitoring range, and enables users to obtain rich information with the ability to review recorded tapes whenever needed [4].
Video-based approach or Video Image Processing (VIP) technic has grown rapidly in recent years. Numerous mathematical models were developed to calculate the traffic parameters, especially vehicle speed, from video sequence. In general, the algorithms used to estimate vehicle speed go through three main stages: moving vehicle detection, tracking this vehicle, and then calculating speed.

A. Vehicle Detection
Most common vehicle detection methods in the literature use background subtraction techniques. The background subtraction technique is based on the idea of identifying portions of the image that remain unchanged in successive frames. Subsequently, the background model is subtracted from each frame of the video, and the remaining areas are blobs that indicate the vehicles present in the scene [5]- [15]. In addition to the background segmentation techniques mentioned above, recently, some researchers used the Convolutional Neural Networks (CNNs) approach that is able to precisely detect vehicles present in a single static video frame. This method is based on Machine Learning to analyzing a database and classifies and detects objects by searching for features [16]- [19]. Optical flow technique is also widely used by researchers. The Optical Flow Method is used to extracted moving vehicle from the dynamic background. Based on characteristics of the optical flow information of the moving target changing with time, this method establishes the constraint equation of optical flow in order to detect the target [20]- [23].

B. Vehicle Tracking
Tracking can be defined as estimating the vehicles trajectory in the image texture as they move in the scene and plot the movement path on the road [24]. The literature indicates that the common methods for tracking moving objects are silhouette based tracking, kernel based tracking and point based tracking. Silhouette based approach is used for tracking of complex shapes such as human features (head, hands and shoulders) [25], [26]. In Kernel-tracking methods, the moving object is represented by a geometric shape (a rectangle or an ellipse). Kernel-based tracking methods assign a weight to each pixel in the shape and then the density gradient in the image coordinates is estimated, using these weights, to determine the new position of that object [27]- [29]. The point tracking method has been adopted by most researchers concerned with the calculating speed of moving vehicles [6], [9]- [15], [19], [30] and [31]. This technique recognizes vehicle's features and the marking of points that will be followed, to promote correspondence between different frames, and then connects the positions of the same points of the vehicle in the frame sequence. The choice of points (features) to be followed is essential for tracking accuracy [17]. However, object tracking algorithms allow tracking of the path taken by an object in a set of video www.ijacsa.thesai.org frames. This block, therefore, provides as a result the distance traveled by a vehicle as well as the number of frames that the vehicle took to cover that distance. To ensure temporal consistency, it is preferable that the interval between frames is constant and short, and that there are no sudden changes in the direction of the object [32], [33].

C. Speed Calculation
From a practical point of view, methodologies of calculating speed of moving vehicles can be classified into two categories; Time-based algorithm and Distance-based algorithm. In the first, a specific number of successive frames are used and then the displacement of the tracked point is calculated in pixels between the first and the last frame. Depending on both the frame rate and the dimensions of one of the road features that shown in the video, time and displacement distance can be calculated in metric units [6], [12], [15], and [19].
The Distance-based algorithm allows the user, before starting the processing, to select four points that represent the vertices of the detection area (region of interest, ROI). This area is a quadrilateral with two sides matching both edges of the road and the other two sides are perpendicular to the road axis and represent the entry line and the departure line for this area. The four points are selected based on specific features of the road. Accordingly, the distance (in metric unit) between the entry and departure lines for this area is known. However, based on the video frame rate and the number of frames taken by the vehicle traveled through this ROI, the algorithm calculates the time and thus the speed [8]- [11], [13], [14], [16], [17], and [20].

D. Problem Statement
The review of the literature indicates that most researchers attribute the inaccuracy of the results to weaknesses in the detection and tracking algorithms. Also, most of the research did not address the effect of traffic volume on the accuracy of measuring vehicle speed. On the other hand, a limited number of researches applied the second methodology (Distance-based algorithm) have been concerned with studying the effect of the ROI length on the accuracy of speed measurement. Wicaksono and Setiyono [10] adopted the fact that vehicles traveling at a constant speed and heading towards the camera seem to speed up when they get closer. Based on distance from the camera, the authors divided the captured road into three regions. Moreover, the authors used three different camera angles (45, 50, and 60 degrees) to capture traffic on each region. The authors concluded that the best ROI for 45 and 50 degree was the closest region, while for 60 degree was full region. Al Kherret et al. [13] validated the model they developed for six lengths of ROI (5, 10, 15, 20, 25, and 30 m). The researchers concluded that the best model accuracy is associated with ROI lengths of 10 and 15 m. Javadi et al. [20] adopted four intrusion lines with a spacing of approximately three meters to define three regions of interest (2.87, 5.95, 8.97 m). Results indicated that using cameras with higher frame rates and changing the distances between intercept lines could reduce the error rate in the measurements.
This study aims to provide a new perspective of thinking for other researchers through modeling the relationship between the error in vehicle speed measurement with video system and each of the length of the detection area, the shooting angle, and the volume of traffic. However, the remaining sections of this paper are structured as follows: Section 2 explains the data collection. In Section 3 statistical data analysis is presented. Section 4 explains the model development and discusses the results. Finally, in Section 5, the conclusions are presented.
II. DATA COLLECTION Data collection was accomplished in two stages. The first stage achieved using traffic simulation software (VISSIM) to generate visualization of traffic operations (video files) and yield statistical data and save it in text files. The reason behind using simulation software in this research was to ensure the quality of the recorded video of traffic and thus avoid errors that usually result from applying the algorithms used to detect and track moving vehicles, as well as the ability of such programs to provide all the necessary data. In the second stage, a video-based detection and tracking model (VBDATM), developed by Alkherret et al., [13] was used for detecting and tracking moving vehicles in the recorded videos, and collecting traffic data such as traffic count, speed, and headways.

A. The Simulation Software Package
The simulation environment is very important layer for Vision-based approach. VISSIM is a microscopic traffic flow simulation software package used to analyze traffic and transit operations under constraints provided by users and evaluate various alternatives based on transportation engineering and planning measures of effectiveness [34]. Traffic simulation includes: users data inputs, transmission the base input data into analyzed statistical data, and generation of outputs in the form of 2D and 3D animations, and database files [35].
According to VISSIM User Manual [34], the base data provided in VISSIM can be classified into: geometric data of roadway network; traffic control strategies such as speed limit, time gap, vehicle mix rate, vehicle classes and so on; general demands that include initial volume inputs, turning volume, and route choice process; performance data used for model calibration purpose and to minimize difference between simulation data and calibration data; and future demands used for estimation and forecast purposes. The characteristics of VISSIM, such as car following, lane change logic, and making comparisons between the various alternatives in the same scenario, makes it a useful tool in this study.

B. Input Parameters of the Simulation Software
In this research, a one way-two lane road, with lane width of 3.6 m, was created in VISSIM and a 30-meter section has been specified as a region of interest (ROI) to collect traffic data for vehicles as they traversed through it. Within limits of the ROI, twenty-six data collection points were distributed with a uniform spacing of 2.5 meters on the two lanes, as shown in Fig. 1. Furthermore, the models "Car1" to "Car6" were defined and "without lane change" choice was selected using high value for time gap parameter in "Driving Behavior Parameter Sets" dialog box. www.ijacsa.thesai.org On the other hand, three different positions were selected for the camera; in the first position (Pos. I), the camera sight line was parallel with the road's longitudinal axis, in the second position (Pos. II), the camera sight line was perpendicular with the road's longitudinal axis, while in the third position (Pos. III), the horizontal angle between the camera sight line and the road's longitudinal axis (the pan angle) was 45 degrees. The camera height in all positions was set at 30 meters with large focal length to provide a sufficient field of vision. Fig. 2 to Fig. 4 show the viewing field of the three positions.
Using six values to determine the volume-to-capacity ratio (V/C), six video clips were recorded for each of the three previously defined camera locations. The V/C values were: 30%, 50%, 70%, 80%, 90% and 100%. Based on, traffic simulation was run eighteen times. Table I shows the periods of recording video clips (sec) corresponding to each V/C ratio. This table shows that the large values have been selected for the recording period when V/C values are low and vice versa.
Selecting multiple values to the simulation's record was to ensure the presence of a sufficient number of vehicles within the monitored section.
As mention above, after each run, VISSIM creates two main file; Audio Video Interleave (AVI) file and text file containing traffic data. VISSIM records AVI files that will be played at a constant rate of 20 frames (pictures) per second. As each simulation time step results in one picture, the actual playback speed of the AVI file depends on the simulation resolution (time steps per simulation second) during the recording; if 10 time steps are chosen (recommended value), the playback speed will be twice as fast as real time. When using only 1 time step, then the resulting playback speed will be 20 times faster than real time.
Data in the text file depends on the parameters defined by the user before running the simulation. In this research, the speeds were attributed to data collection point number, start/end time of the aggregation interval, and vehicle length. This data has been handled and prepared to suit the requirements of this research.      In the third section, the vehicle is detected using a stream processing loop. The loop aims to read input video frame, convert the colored frame to a binary image, remove small objects (noise), and erode the binary image and generate the final scene. These steps are shown in Fig. 5.
The data collection phase begins with estimating the center's coordinates and bounding box of the blobs in the foreground image using the initialized system object. Then, frame number, the X-center, Y-center, lane, and shortest distance (Short_Dist) are stored in array of six columns and its rows number equals the number of effective frames of the video clip. Effective frames meant only those where the vehicles within the ROI. Table II shows an example of the major output of the processing of the video clip.
These row data used to keep track of already detected vehicles and isolate data of each vehicle. Taking into account value of frame rate of recorded videos, and based on the comparison of data recorded in successive frames, each vehicle was tracked individually and its speed was calculated as it passed through the ROI.
This model validated using speed data collected from video clips recorded with the camera at Position II only. A comparative analysis was established to test whether not the developed model can produce accurate speeds that are close to actual speeds reported by VISSIM for the six lengths of ROI mentioned above. In general, the results of the analysis demonstrated that the VBDATM model is a valuable tool for collecting speed data and other essential traffic data [13].

III. STATISTICAL DATA ANALYSIS
This paper aims to estimate the error value in the measurements of a video-based detection and tracking model, developed for collecting traffic data, and to develop a model that formulates the relationship between the accuracy of the measurements and each of: length of the detection area (i.e. length of ROI), the V/C ratio, and the position of camera as potential influences on this accuracy. For purposes of analysis, the first step was taken to prepare the VISSIM data. The VISSIM database contains speeds values that attributed to data collection point number, start/end time of the aggregation interval and vehicle length as shown in Fig. 6.
Using "FOR" statements loop, a macro was created in MATLAB environment to sort and arrange these data in 26 Excel sheets in three files representing the three camera positions. Each sheet represents the information recorded by one data collection point. Example of final form of the sorted data set is shown in Fig. 7.
Data resulted from the previous procedure were stored in two Excel files. The first file included data of right lane, while the second was devoted to data of left lane. These files used to determine the statistical significant differences between the mean speeds of vehicles for right and left lane. The estimated differences determine whether should study each lane alone or combine in a single database.

A. Descriptive Statistics of Actual and Estimated Speeds
Values Speed data were tested using the "Descriptive Statistics" tool in Excel. Average, standard deviation, maximum and minimum values of the estimated speeds classified according to the length of the ROI and the camera position with the corresponding values that represent the output of VISSIM. This arrangement is adopted for all V/C ratios. The numbers of observations are included. The descriptive statistics of actual and estimated speeds for right lane are illustrated in Table III and Table IV, and for left lane in Table V and  Table VI. Estimated errors represent the difference between the actual speed (from VISSIM) and the estimated speed (from the VBDATM model) divided by the actual speed. The mean errors in estimating speeds on right and left lane for each of V/C ratios, length of ROI, and camera positions are listed in Table VII.        For the right lane, the results showed that the maximum error was 4.32 km/h in speeds estimating at camera position I, while the minimum error was 0.29 km/h. Maximum error estimated when the length of ROI was 10 m and the V/C ratio was 30%. In the video clip recorded at position II, the maximum error was 2.65 km/h corresponding to 5 m for the ROI length and 50% for V/C ratio. The minimum error was 0.29 km/h that estimated when the length of ROI was 25 m and the V/C ratio was 30%. At position III, the maximum error was 2.65 km/h corresponding to 5 m for the ROI length and 80% for V/C ratio, while the minimum error was 0.86 km/h corresponding to 15 m for the ROI length and 70% for V/C ratio.
For the left lane, the results showed that the maximum error was 4.22 km/h, in speeds estimating at camera position I, while the minimum error was 0.81 km/h. Maximum error estimated when the length of ROI was 15 m and the V/C ratio was 90%. In the video clip recorded at position II, the maximum error was 3.14 km/h corresponding to 5 m for the ROI length and 100% for V/C ratio. The minimum error was 0.27 km/h that estimated when the length of ROI was 25 m and the V/C ratio was 100%. At position III, the maximum error was 2.68 km/h corresponding to 5 m for the ROI length and 100% for V/C ratio, while the minimum error was 0.92 km/h corresponding to 10 m for the ROI length and 70% for V/C ratio.
Previous results give an initial visualization that there is no relationship between the speed estimation error and each of length of ROI, V/C ratio, and the position of camera. However, study the effect of the mentioned factors on the accuracy of estimation involves performing a regression analysis for all data. Before conducting this analysis, the differences between the mean speeds of vehicles for right and left lane should be estimated.

B. Difference between Data of the Two Lanes
One-way classification analysis of variance with interaction (ANOVA) was employed to investigate whether there are differences between speeds values in right lane and left lane. One-way ANOVA tests the equality of population means when classification is by one variable. In other word, one-way ANOVA is used when there is only one way (lane) to classify the populations of interest [36]. In this study, on-way ANOVA was carried out to compare the differences between the mean errors in estimating speeds on right and left lane for each of V/C ratios, length of ROI, and camera positions. However, for the P-values less than 0.05, the null hypothesis can be rejected.
The results showed that for most of cases (67% of cases), there was a statistically significant difference (P-value < 0.05), at the 95% confidence level, between the mean speeds of vehicles (Table VIII). Accordingly, speeds that estimated on each of the right and left lanes were prepared separately in order to determine the relationship between the measurement accuracy and each of the length of ROI, V/C ratio, and the position of camera. 0.040 0.000 0.000 0.000 0.002 0.233* * P-value > 0.05

IV. MODELS DEVELOPMENT
Statistical Analysis Software (SAS-V8) used to determine the relationship between the estimation error as a dependent variable and each of the length of ROI, V/C ratio, and the position of camera as independent variables. SAS-V8 is statistical analyses software that is capable of handling the multiple regression analysis. In this study, the stepwise regression technique was used in the SAS software so that those independent variables, which are not statistically significant at 95% confidence level, can be automatically removed from the models.
where: CP = the camera position.
L ROI = length of the region of interest (m).
In both models, the negative sign of the coefficient for L ROI means that the error of speed estimation (E R and E L ) decreases with increases in the length of the region of interest.

V. CONCLUSIONS
The main goal of this study was to present a new perspective of thinking for other researchers through developing a multiple linear regression model of the relationship between the errors of visual-based models used for vehicle speed estimation (dependent variable) and each of the detection region length, the camera angle, and the volumeto-capacity ratio (V/C), as independent variables. Traffic simulation software, VISSIM, was employed to generate a set of videos for virtual traffic according to the following modes: a one way-two lane road, three camera positions, six values for volume-to-capacity ratio, and six distances to measure vehicle speed. These videos were analyzed with a video-based detection and tracking model (VBDATM) developed by Alkherret et al., (2015). Estimated measurement errors were expressed as differences between each of the actual speeds generated by VISSIM and the speeds computed by the VBDATM divided by the actual speed.
In conclusion, regression model showed that the V/C ratio does not affect the accuracy of the estimate and there are weak relationships between the estimated error and each of camera position and the detection region length. The two models clearly indicate that the increase in the length of the region of interest increases the accuracy, in other words, it reduces the amount of error in measurement. Moreover, changing the camera position improves the accuracy of the measurements.
The limitation of the methodology used in this research is to use an ideal environment for traffic visualization generated by a simulation program. The author proposes to apply the methodology and expand it to include real-world traffic, taking into account that, in the real world, camera stability and image clarity cannot be guaranteed, and determine the angle and location of the video camera may not be under control. Furthermore, although the accuracy of the results is affected by the camera location, as is evident in the developed models, the issue of choosing the ideal location for taking the picture remains subject to the costs and capabilities available to the study and implementing authorities.