Estimation of Trajectory and Location for Mobile Sound Source

In this paper, we present an approach to estimate mobile sound source trajectory. An artificially created sound source signal is used in this work. The main aim of this paper is to estimate the mobile object trajectory via sound processing methods. The performance of generalized cross correlation techniques is compared with that of noise reduction filters for the success of trajectory estimation. The azimuth angle between the sound source and receiver is calculated during the whole movement. The parameter of Interaural Time Difference (ITD) is utilized for determining azimuth angle. The success of estimated delay is compared with different types of Generalized Cross Correlation (GCC) algorithms. In this study, an approach for sound localization and trajectory estimation on 2D space is proposed. Besides, different types of pre-filter method are tried for removing the noise and signal smoothing of recorded sound signals. Some basic parameters of sound localization process are also explained. Moreover, the calculation error of average azimuth angle is compared with different GCC and pre-filtered methods. To conclude, it is observed that estimation of location and trajectory information of a mobile object from a stereo sound recording is realized successfully. Keywords—Sound processing; sound source localization; azimuth angle estimation; generalized cross-correlation; interaural time difference; interaural level difference


INTRODUCTION
The general definition of sound is pressure change versus time which is examined by microphone or microphone arrays.Also, these differences are used for sound recognition, localization, and classification, etc.The sound localization is implemented by using this information.Generally, Sound localization process is inspired from the human hearing system [1][2].There are several areas of research for sound processing [3].One of the most important areas is sound localization [4].In parallel to technological improvement, sound localization is used in very different areas [5][6] such as security systems, cell phone, voice command application, and conference system, etc.The term of sound localization is determining the coordinates of sound source in 2D or 3D Space [7][8].The 2D coordinate information is adequate for some of the basic applications [9][10].Besides, the information of 3D coordinates is necessary for complex sound processing applications.In this study, we obtain 2D coordinates information for moving person via stereo sound recording [11].A sound dataset which is created artificially is used for this work [12][13][14].
In this paper, Section II explains sound localization.In Section III, 2D sound localization via microphone pair is described.Estimations of sound delay and sound source trajectory are carried out in Section IV and V, respectively.Finally, conclusions are given in the last section.

II. SOUND LOCALIZATION
The term of sound localization is determining of the coordinates of sound sources by using some of the signal processing methods.Generally, the users must have least three microphones or sensors for localization.
We propose a different approach for sound localization in this paper.In contrast to the other studies, the locations of sound sources are taken by only two microphones.Furthermore, two common terms for sound localization process which are ITD and Interaural Level Differences (ILD) are used [15][16].The parameters of ITD and ILD are time and amplitude difference between signals received by microphones or sensors [17].The parameter of ITD is used generally for determining azimuth angle [18][19].ITD and ILD are illustrated in Fig. 1 and Fig. 2, respectively.The parameter of ITD is much more reliable compared to that of ILD.The ITD is easily calculated by cross-correlation methods.The result of ITD is sample difference between received signals.The azimuth angle is also calculated by this parameter and sampling frequency is taken as   = 44100  in this study.The calculation of azimuth angle is shown in Fig. 3.The speed of sound is very sensitive to environmental conditions such as especially ambient humidity and temperature [20].

Amplitude (V)
The calculation of the speed of sound depended on ambient temperature and azimuth angle is shown in (1) and (2), respectively [21].

III. 2D SOUND LOCALIZATION VIA MICROPHONE PAIR
In this section, 2D localization via microphone pair is explained in detail.The sound localization problem is solved by three or more microphones and the researchers use Time Difference of Arrival (TDOA) between microphone pairs [22][23].The advantage of this approach is that the user can determine x and y coordinates via only one microphone pair.However, the users have to obtain not only the parameter of ITD but also ILD parameter of sound signals [24].In this approach, the inverse square law of energy propagation is used for sound localization.The received sound signals are defined as  1 and  2 [21].  () represents to the white noise on environment.
The energy received from microphone is where w,   , and d are the screen size of the observation, the energy of sound signal x i , and the distance between sound source and receiver, respectively.
The relationship between the energies and distances can be obtained by using these equations where is error term and (x i , y i ) and (x s , y s ) are the coordinates of an i-th microphone and the sound source, respectively.
(5) Since the signals acquired by each microphone can be assumed as delayed replicas of the source signal, localizing a source is to estimate time delay estimation (TDE) between the signals of two microphones.  is time delay and d i is ignored in TDE model.When time delay is measured, hyperbolic equation can be satisfied by Cartesian coordinate.(6) where c and  12 are sound speed and TDOA of mic1 and mic2, respectively.By using (4) and ( 5), (7) can be obtained by where =� 1 / 2 .The noise term / 1 can be ignored in a high Signal to Noise Ratio (SNR) environment.According to this, (7) can be inserted into (6) and following equations are obtained as The exact source position can be found by composing (8) and (9).The existence of solution is defined by where d is the distance between two circle centers.When  12 is greater than zero, it means that source reaches mic1 later than mic2 and  is also less than one.Thus,  12 and (1-) are positive or negative.In case of E 1 ≠E 2 , it can be determined by (11) In the case of E 1 =E 2 , it means that there will not be an intersection to determine source position.To solve this problem, ( 8) and ( 9) are used again as follows According to this, the equations are obtained as follows where ) Matrix form is defined by following equations where R s is the source coordinate.
Equation ( 21) is obtained by inserting ( 20) into ( 15) where Positive root means the square of the distance from the source to an origin.The final source coordinate can be found by using R s calculation in (20).microphone source location method is used in 2-D space.This method is realized by database created artificially.The ambient temperature is selected as 20  0 for this work.Stereo sound recording is shown in Fig. 5.

IV. SOUND DELAY ESTIMATION
The sliding window techniques are used for energy and delay estimation of sound channels.The width of sliding window and step size are selected as 1024 sample and 10, respectively.Delays between sound channels are calculated by GCC algorithms [25].Two different types of GCC algorithms which are basic GCC and Generalized Cross Correlation with Phase Transform (GCC-PHAT) are used in this paper [26][27].The GCC of the signals recorded by two microphones is given by where  1 () and  2 () are the signals recorded by one and two microphones in Fourier domain and w is angular frequency.The weighting function   () is designed to optimize the given performance criteria.Many different types of weighting function are used in the literature.The most common one is Phase Transform (PHAT) which is defined in (25).
Three different types of signal smoothing filter such as the moving average filter are utilized for increasing the success of delay estimation in this study.The moving average filter is shown in (26).The parameters including [], [], and  refer to the input signal, the output signal, and the number of points used in the moving average filter, respectively [28].The weighted moving average filter is also given in (27).
V. ESTIMATION OF SOUND SOURCE TRAJECTORY The estimation of sound trajectory is mentioned in this section.The implementation of our approach consists of several steps.Firstly, noise removal filters are applied to raw sound recordings.After that, the estimation of delays between sound channels is realized by sliding windows.The estimated delays are converted to the angle by using (2).First order polynomial is fitted to determine the angle of linear movement of sound source versus time.The calculated delays and azimuth angle of the sound signal are shown in Fig. 6 and 7, respectively.The first order polynomial interpolation result of azimuth angle is illustrated in Fig. 8.The mean calculation error of azimuth angle versus different algorithms is shown in Table I.The order of filters is selected as N=5 for the whole process.As shown in Table I, the best combinations of GCC method and pre-filter are generic GCC and Median filter.The mean difference between the estimated and values of the real coordinates is 8.544 cm.

VI. CONCLUSION
In this paper, we propose a different approach for sound source localization which is 2D sound localization microphone pair in contrast to common approaches.Some basic parameters of sound signal processing are explained and they give an idea about sound signal localization, azimuth angle determination and exact localization of sound sources.This method is related not only delay between sound channels but also the energy ratio between recording sound signals.The 2D sound localization using microphone pair approach is implemented for sound signal created artificially.The result of calculated azimuth angle is compared with different GCC and noise removing filters methods.Mobile sound source trajectory depended on the time is also calculated.Furthermore, curve fitting method is applied for estimation of the sound source motion.The energy and delay parameter are also used for calculating the coordinates of the sound source.The optimal combination of methods is examined for sound source localization.The coordinates of mobile sound source are calculated very precisely and successfully depending on the time.It is seen that the azimuth angle for linear motion of sound sources is calculated in a minor error.In future work, real time sound localization application with similar approach will be performed by embedded systems such as Raspberry Pi.

Fig. 5 .
Fig. 5. Sound signals In this section, trajectory estimation of sound sources and determination of the coordinates are explained.The dual-

Fig. 8 .
Fig. 8.Comparison of real and estimated azimuth angle By using dual-microphone source location method, the estimation and curve fitting change of X and Y coordinate values versus time are shown in Fig. 9 and 10, respectively.

Fig. 10 .
Fig. 10.Y coordinate change versus time The 2D coordinate map of sound source motion is shown in Fig. 11.As seen in the figure, the estimated motion of sound source is from A to B by using 2D sound source localization approach.

Fig. 11 .
Fig. 11.Direction of motion on coordinate map

TABLE I .
CALCULATION OF MEAN ERROR OF THE AZIMUTH ANGLE