Music Note Feature Recognition Method based on Hilbert Space Method Fused with Partial Differential Equations

—Hilbert space method is an old mathematical theoretical model developed based on linear algebra and has a high theoretical value and practical application. The basic idea of the Hilbert space method is to use the existence of some stable relationship between variables and to use the dynamic dependence between variables to construct the solution of differential equations, thus transforming mathematical problems into algebraic problems. This paper firstly studies the denoising model in the process of music note feature recognition based on partial differential equations, then analyzes the denoising method based on partial differential equations and gives an algorithm for fused music note feature recognition in Hilbert space; secondly, this paper studies the commonly used music note feature recognition methods, including linear predictive cepstral coefficients, Mel frequency cepstral coefficients, wavelet transform-based feature extraction methods and Hilbert space-based feature extraction methods. Their corresponding feature extraction processes are given.


I. INTRODUCTION
With the continuous progress of science and technology, the development of high and new technologies such as computer technology, information technology, and microelectronics, and the continuous improvement of computer computing power, people are paying more and more attention to how to find the objective function in complex problems that can be solved by lower-order variables [1]. The traditional method can no longer meet the higher precision requirements in researching mathematical problems, and it is increasingly challenging to meet practical application requirements. Traditional methods are generally studied by analyzing variables and constructing a step response function [2]. Usually, each parameter in the differential equation needs to be derived, and the corresponding equation is obtained by using the variation relationship of partial derivatives of each variable at different orders. The Hilbert space is composed of many independent variables, and these independent variables have their corresponding equation expressions at different orders [3].
The traditional denoising method of music noise is a pointby-point iterative method with good timeliness. However, many iterative calculations and numerical analyses are often required to establish partial differential equations, which are computationally intensive, time-consuming, and challenging to obtain accurate results [4]. This paper establishes a numerical music noise model based on partial differential equations, and the initial value problem of partial differential equations is transformed into numerical and iterative noise. The model is simple, convenient, easy to implement, and accurate. In this method, the continuous function in the time domain is first transformed into a higher-order system state equation, then converted into a discrete form through a series of inverse transformations, and then processed with partial derivatives, thereby realizing the numerical analysis of partial differential equations [5]. This method can effectively solve many nonlinear problems, and the calculation accuracy is improved to a certain extent compared with other algorithms.
The traditional signal analysis techniques commonly used at present are based on linear predictive analysis or Fourier analysis, or Wavelet analysis techniques. The principle of these techniques is based on processing raw data, decomposing it into a linear model, and then describing it with Fourier transform [6]. Therefore, these techniques are mainly aimed at analyzing linear and stationary signals, but for nonlinear signals, the linear model is often difficult to describe effectively, and the Hilbert space method can make good use of these theories [7]. The Hilbert space method is a new nonlinear signal feature recognition method proposed by Academician Huang E et al. The principle of this method is based on the decomposition of the Hilbert space method. The SCV or FFTM technique synthesizes a linear stationary non-second-order derivative sequence signal to represent its characteristics. The Hilbert space method can be used to identify nonlinear signals and effectively applied to identify linear signals, which has high practical application value.

II. RELATED WORK
In this case, we use the partial differential equation learning model to solve the challenging facial recognition problem. There is a proposal for a unique feature selection method that uses a learning model based on partial differential equations. Because of this, the extracted features are more resistant to shifts in lighting conditions and can be rotated and translated without losing their integrity. This article by Xia Miao (2021) employs the face detection algorithm in face recognition technology to first detect the face and intercept the expression data, then calculates the increase rate, all in gauging students' focus in class. The expression is then scored based on the revised model of concentration analysis and evaluation of a college Chinese class, which is utilized to identify the www.ijacsa.thesai.org expression. In the end, the concentration score is the expression score multiplied by the head-up rate. Experiments are conducted in real classrooms, and findings are analyzed to draw appropriate conclusions and instructional recommendations [8]. The sparse representation of sample points in the neighborhood is created after the k-nearest neighbor approach selects a large neighborhood set for each face, thus combining the locality of the k-nearest neighbor with the robustness of sparse representation. Utilizing sparse reconstruction coefficients to characterize neighborhood geometry and weighted distance to characterize class dissimilarity, the sparse preserving nonnegative block alignment approach builds a discriminant partial optimization model. The two algorithms are successful and robust, as evidenced by their ability to produce accurate clustering and recognition results across a wide range of conditions, including both real and simulated occlusion. This study validated the model through in-class practice assessments, teachers' inquiries, and interviews with students and teachers to ensure its accuracy. The outcomes demonstrate the validity and trustworthiness of the proposed combined evaluation approach based on expression and head-up rate.
As a preprocessing step in a wide variety of applications, such as sound separation and musical note transcription, musical pitch estimation is used to identify the pitch of a musical note or the fundamental frequency (F0) of an audio stream. Based on the categorization framework, Tamboli (2019) creates a neural network optimized for musical note recognition (OBNN). The strategies for identifying musical notes were identified after reviewing a variety of surveys and studies. Here, an OBNN is utilized to identify musical pitches. Similarly, by utilizing various approaches, we can improve the efficiency of musical note recognition [9]. The most recent studies on musical note identification are successfully summarized here, along with the characteristics and categorizations gleaned from those studies.
Automatic speech/music classification employs various signal processing methods to sort audio/visual files into predefined categories. In order to categorize incoming audio signals into speech/music signals, Arvind Kumar's (2022) suggested work investigates Hilbert Spectrum (HS) obtained from various AM-FM components of an audio signal, also known as Intrinsic Mode Functions (IMFs). Hilbert Transform of the IMFs yields a two-dimensional representation of the HS, a map of instantaneous energy (IE) and frequencies (IF). Via creating unique IF and Instantaneous Amplitude (IA) based cepstral features, we subject this HS to a Mel-filter bank and Discrete Cosine Transform (DCT). Three datasets (Slaney Database, GTZAN Database, and MUSAN Database) were used to validate the results. Extensive experiments were undertaken on various combinations of audio files from the S&S, GTZAN, and MUSAN databases to evaluate the broad applicability of the proposed characteristics, and positive results were attained. Finally, the system's performance is compared to previously implemented cepstral features and other related efforts [10].
Integrating AI with deep music for recommendations has been a growing area of study in recent years. Deep learning is a complicated machine learning technique that may infer value laws from features observed in training samples. The proliferation of deep learning networks is key to the future of AI and offers a fresh perspective on music score identification. Qin Lin's (2022) paper utilizes the enhanced deep learning algorithm to study music score recognition. To achieve feature extraction and intelligent recognition of music scores, we build on the foundation of the classic neural network by introducing the attention weight value improved convolutional neural network (CNN) and the high execution efficiency deep belief network (DBN). A CNN&DBN-based feature learning method was developed for music score extraction using the feature vector set extracted by CNN&DBN as input [11]. Experimental results show that the proposed model effectively recognizes a wide range of polyphony music types, with improved recognition and performance; the improved algorithm applied to soundtrack identification achieves a recognition rate of up to 98.4%, which is significantly higher than those of other classic algorithms. It demonstrates the massive potential for study in the field of music retrieval using deep learning and provides data support for building a knowledge graph in the music field.
It is a common goal in engineering and computer science to give machines sensing abilities on par with those of humans. Much work has been done to give computers the ability to collect, process, evaluate, and understand their environment in the same ways humans do. Explicitly referring to the auditory system, machine hearing is the capacity of computers to perceive their acoustic surroundings in the same way humans do. A proper audio signal representation is crucial to accomplishing this lofty goal. This study by Alas F et al.
(2016) provides a comprehensive overview of the most recent advances in audio feature extraction methods for analyzing standard audio signals like voice, music, and environmental noise [12]. For the sake of thoroughness, the writers revisited old methods and included the most recent developments based on new fields of research and unique bio-inspired recommendations. These methods are classified in a taxonomy that groups them by their physical or perceptual underpinnings and then further subdivided by the type of computing they do (time, frequency, wavelet, image-based, cepstral, or other domains). The methods are described, and recent applications to issues with machine hearing are provided as illustrative instances.
Since its introduction, the Hilbert-Huang transform method has been widely used thanks to its superiority in several different contexts. The Hilbert spectrum accurately reflects the signal energy's dispersion over multiple scales. Using the Hilbert energy spectrum, which characterizes the distribution of instantaneous energy, Li X (2011) proposes a novel characteristic dubbed ECC. Compared to the conventional short-term average energy, the experimental findings clearly show that ECC performs better [13]. Combining the ECC and mel frequency cepstral coefficients (MFCC) provides a more detailed picture of the energy distribution over both the temporal and frequency domains, and the features of this set outperform those of the short-term average energy, pitch, and MFCC in terms of recognition accuracy. Then, after that, new and improved ECC variants are created. Combine ECC with the teager energy operator to get TECC, and add the www.ijacsa.thesai.org instantaneous frequency to the energy to get EFCC. Seven different emotional states are tested, with boredom having the highest detection rate (83.57%) and the highest categorization accuracy (100%). The proposed characteristics ECC, TECC, and EFCC were shown to enhance speech emotion recognition performance in numerical tests significantly.
Film video noise is commonly understood as digital signal system errors manifested as artifacts in the video image. Videos captured with different cameras will always have some degree of this distortion. The primary purpose of noise reduction is to lessen the amount of distracting background noise in a video while allowing the image's edges and textures to come through clearly. Pingli Sun et al. (2021) provide a comprehensive explanation of the space-time noise reduction filter's workings, along with the development of a 3D-filter algorithm for Gaussian noise, an enhanced 3D-filter algorithm for mixed noise based on the 3D-BDP (bloom-deep-split) filter, and a filter algorithm for luminance and color noise in dimly lit scenes [14]. They build a novel iterative denoising algorithm by deconstructing the PDE denoising process. Partial differential equations can be thought of as an iterative denoising of the filter. The new algorithm's initial stage employs a wavelet-domain adaptive Wiener filter as its filtering foundation, with promising results achieved by careful tuning of the filter's parameters. Analysis findings demonstrate that the model proposed in this section can efficiently eliminate multiplicative noise compared to the existing denoising model. The experimental report demonstrates that, compared to the partial differential equation method for denoising, the algorithm's parameters have some stability and can obtain satisfactory processing outcomes for many images. Using the proper partial differential equation approach, the pseudo-Gibbs are eliminated in the second step of the algorithm, greatly enhancing its performance. After applying the new algorithm to a Gaussian-noise-filled image, the pseudo-Gibbs effect, which frequently occurs in wavelet denoising, and the step effect, which occurs in partial differential equation denoising, are both eliminated, details are better preserved, the peak signal-to-noise ratio is improved, and numerous experiments demonstrate the algorithm's efficacy as a denoising method.

III. DENOISING MODEL OF MUSICAL NOTE FEATURES
BASED ON PARTIAL DIFFERENTIAL EQUATIONS The music note feature denoising method based on a partial differential equation mainly uses the variational denoising method (TV) to identify the noise feature. The principle of this method is to transform the denoising problem into an extreme problem of finding the energy functional expression established under two constraints. The basic idea of this method is to transform an extreme value problem with a time window constraint function into an unconstrained nextdimensional signal and solve it in the time domain, and then obtain the actual non-stationary random process through iterative transformation, and finally realize the estimation of the noise spectrum [15].
Before studying the de-algorithm, the noise reduction model needs to be studied first. The noise reduction model is expressed as formula (1): Based on the noise reduction model, related personnel proposed a global variational model based on a bounded variation space, and its definition is shown in formula (2): Formula (2) satisfies constraints (3), (4): Usually, the noise will make the overall variational energy of the acquired musical note features large, and it is difficult to identify accurately. The noise reduction model can optimize the overall variational model of musical note features, that is, to minimize the overall variational energy, and it can accurately identify the temporal and spatial dimensions of music. Therefore, the denoising of musical note features is mainly to minimize the energy functional, which can be expressed as formula (5)  The gradient descent flow equation corresponding to the variational problem Equation (6) is shown in Equation (7): TV flow is stable and has a globally optimal solution. It is a diffusion function between forward diffusion and progressive diffusion. This function can obtain a better diffusion process and feature information effect of musical note features [16]. The numerical implementation of the TV model is as follows: The numerical realization of the TV model usually has three differential schemes: explicit, implicit, and semi-implicit. The implicit and semi-implicit schemes have higher secrecy than the explicit scheme. However, for the semi-implicit and implicit numerical implementation methods, the secrecy makes the whole calculation process more complicated in the iterative process of removing noise. Complex and the convergence rate is relatively slow. Compared with the other two numerical iterative implementations, the numerical implementation using www.ijacsa.thesai.org clear difference has a much faster convergence rate [17]. Therefore, the display scheme is commonly used by people, and the implementation of the display scheme is as follows: The first display scheme is available as shown in formula (8) (8) Substituting the difference quotient for the partial derivative, the formula (9) can be obtained: Among them: u n is the musical note feature after the nth diffusion, CCC is the time interval or time step, and is a firstorder, second-order recursive relationship, which is characterized in that the continuous wave signals in the upper and lower columns are both continuous. Based on this feature, good results are obtained after analyzing the fusion of partial differential equations by establishing simple equations, differential equations, and related methods.
In denoising music note features, to improve processing efficiency, it is necessary to add regularization parameters in the iterative process. Of course, in continuous iteration, the regularization parameters need to be continuously updated to achieve better results.

IV. MUSIC NOTE FEATURE RECOGNITION BASED ON THE HILBERT SPACE METHOD
The relational expression between each order in Hilbert space is linear, and each inter-order variable function is described and defined by itself to form a set of vector groups. Under this method, many problems can be used as time series models for solution analysis and forecasting. However, combining time series with variables in practical applications is often necessary, which is also a significant feature of the Hilbert space method [18].
Hilbert spectrum has the following characteristics: • It can describe random processes that are linearly independent and independent of each other between two variables and have high-order convergence and stability. Therefore, the Hilbert space method can be used to study the linear correlation between variables in a multi-period non-stationary single system.
• It can be used for analysis when the non-deterministic and unstable states are in the Hilbert space equation. The combination of non-deterministic and unstable states can be combined to build a new model to solve. Hilbert space can be used in practical observation for analysis under uncertainty and steady state instead of simply solving by differential equations.
The principle of music note feature recognition is to analyze and compare the feature vectors corresponding to different note definition domains and consecutive states. It can analyze the position relationship of different notes by calculating the matrix of words corresponding between functions and feature vectors to analyze different notes and then get different feature vectors, also known as the Hilbert space method. Based on its recognition principle, its implementation model can be obtained, as shown in Fig. 1.

A. Hilbert Space Method
The Hilbert space method is a new method applicable to nonlinear and non-stationary signal processing, which can be used to study the decomposition and fitting of nonlinear problems and is of great importance in many practical applications. The Hilbert space method consists of two steps: firstly, the signal is decomposed by EMD, through which a single set of components IMF can be obtained. The IMF can effectively reflect the internal vibration pattern of the signal. Then the decomposed obtained IMF is subjected to Hilbert transform and Hilbert spectrum analysis. The results of Hilbert spectrum analysis are transformed into the corresponding state curves, and then the Hilbert space method with instantaneous time series is obtained [19].

B. Empirical Modal Decomposition
One of the essential concepts in the Hilbert space equation is the decomposition of modalities. In a broad sense, any new definition can be achieved using elements of different types or properties. However, for most mathematical problems, if we want to study more complex functions usually use both descriptive and approximation-type methods, but in Hilbert space equations, a special type of decomposition modality is used. The form of a continuous function is discrete into several and then a series of combinations to make it a new set of definitions [20]. EMD is the key step of the Hilbert space method, which can be regarded as the screening process to obtain IMF. The rationality and effectiveness of the EMD process are based on the following two points: www.ijacsa.thesai.org • Any complex signal s(t) can be represented as n instantaneous frequencies with actual physical meaning. In Hilbert space, each independent period can be expressed as n different frequencies, so the complex signal s(t) is represented as shown in formula (10).
• The termination condition of the screening: Definition of IMF.
Based on the above, IMF is a signal (function) that is symmetric concerning the local zero means (Local Zero Mean). It has the same number of extreme value points (Extrema) and zero crossing points (Zero Crossings). Moreover, the graphical representation of IMF is shown in Fig. 2.

C. EMD Filtering
The Hilbert transform is superior in IMF components, and its superiority in different problems is unmatched by other modeling methods. IMF with a "single component" is integral to the Hilbert analysis process. However, most musical note signals are not IMF and do not represent Hilbert space, so it is not easy to analyze and process them when model identification is performed. Therefore, it is required that a straightforward Hilbert transform of such a signal cannot give a complete description of its frequency content. The signal has to be decomposed into IMFs so that a signal containing IMFs can be obtained [21]. The decomposition method that can effectively decompose a group of IMFs is EMD, and the essence of this decomposition method is to determine the intrinsic vibration mode of a signal based on its characteristic time scale and to define its characteristic frequency by this intrinsic vibration mode, to realize the classification of its mode. Therefore, the Hilbert transform can decompose the signal into two low-order high-order dynamic states with different Eigen frequencies and position functions. For the high-dimensional discrete spectral estimation problem, it is required to have enough time windows to ensure that the computational results converge to a suitable size range and to obtain a more accurate data set and its modal parameters (e.g., step length, step width, etc.), which is where the DMD method can have advantages in dealing with complex nonlinear models.
Based on the definition of IMF, the filtering process can be performed using the envelope formed by the signal's local maxima and local minima, respectively. Local maxima of a signal mean decomposing a stochastic process containing all non-zero elements into a series of a small number of components and performing an exact calculation in each component to obtain an estimate of the objective function [22]. However, a definition is given for the Hilbert space: "Based on the characteristic equation (FME) algorithm proposed is a new method-Rython." The FME algorithm is an essential branch of Hilbert space, which can be used to solve signals and parameters. It has a wide range of applications in analyzing, estimating, and predicting time variation. The mean value of the upper and lower envelope of the original signal s(t) is denoted as m1(t). Then the difference between s(t) and m1(t) is the first component, denoted as h1(t), as shown in formula (11).
In the second screening, considering h 1 (t) as the original signal and applying the same method, formula (12) is obtained.
Then the screening process is similarly repeated k times until h 1 , k (t) satisfies the conditions of IMF for the first IMF component. This process is expressed as shown in formula (13).
Let IMF 1 (t)=h 1 , k (t) so that IMF 1 (t) is the first IMF component screened from the original signal s(t), and we refer to this level of screening as inner-level screening. The innerlevel screening process requires the determination of a screening termination criterion, which is an essential innerlevel condition that determines whether the inner-level sequence can be extrapolated. This criterion can be bounded by the standard deviation SD (Standard Deviation) between two successive screening results. SD is defined by formula (14): In summary, the standard EMD decomposition flow chart is shown in Fig. 3.
The Hilbert space method is a novel method applicable to nonlinear and non-smooth signal processing, which can be used to study the decomposition and fitting of nonlinear problems and is of great importance in many practical applications. The Hilbert space method consists of two steps: firstly, the signal is decomposed by EMD, through which a single component IMF can be obtained, which can effectively reflect the internal vibration pattern of the signal, and then the decomposed IMF is analyzed by Hilbert transform and Hilbert spectrum, and the result of Hilbert spectrum analysis is transformed into the corresponding state curve, and then the Hilbert space method with instantaneous time sequence [23]. www.ijacsa.thesai.org This provides validity guarantees for the convenience of the Hilbert transform and the calculation of the instantaneous frequency.
The combined form of the Hilbert Marginal Spectrum (the HMS) and the instantaneous energy density level equation is the most commonly used and represent the instantaneous energy density function in the Hilbert space. The algorithm can be used to obtain the model parameters by selecting the model parameters and then transforming the complex problem into a simple mathematical one. The Hilbert marginal spectrum is a nonlinear multi-model with high prediction accuracy, and the algorithm can transform a complex problem into a linear solution. It can accurately estimate its approximate solution, as shown in formula (16). The equation of energy density level is the most commonly used and representative model parameter in Hilbert space, which can be not only directly obtained but also used for solving. It is widely used in various complex problems, and its representation is shown in the formula (17).
The commonly used feature extraction methods are linear predictive cepstral coefficients, Mel frequency cepstral coefficients, wavelet transform-based feature extraction methods, and Hilbert space method-based feature extraction methods.

Feature extraction based on linear prediction cepstral coefficients
The feature extraction of linear predictive cepstral coefficients is the most critical and core problem in the whole multivariate statistical analysis process. We usually use singlefactor models to perform dimensionality reduction in traditional regression methods. However, as the application range becomes wider and wider, the computational conditions keep improving, and the complexity of data processing becomes more and more complicated, it is challenging to meet the actual accuracy requirements, and it is impossible to directly use the variance estimation method to do multiple linear prediction verification on experimental samples. Therefore, a new linear predictive cepstral coefficient model is proposed, which can reduce the complexity of data processing by extracting feature space information with more implied parameters and higher dimensionality [24]. The feature extraction process of linear predictive cepstral coefficients is shown in Fig. 4.

1) Feature extraction based on Mel frequency cepstrum coefficients:
Mel frequency cepstrum coefficient is a nonlinear feature, and its parameters change with time. Therefore, its characteristics must be processed in some way before linear regression analysis is performed. However, the above method has some problems: firstly, it needs to calculate a large number of offline fitted curve weights and use one of them as the standard deviation, so it is computationally intensive, and the rate of change of the offline fitted curve weights will change over time. Thus it cannot accurately describe the distribution of linear feature points on a straight line. Secondly, a series of complex processes, such as obtaining new parameters, may be required before this can be used to achieve the goal of maximizing the effective extraction rate, so these are essential elements to be studied based on Mel coefficients and the feature extraction of Mel frequency cepstrum coefficients is shown in Fig. 5. 2) Study of wavelet transform-based feature extraction method: As an extension of time-frequency local features in time and space, wavelet transform is a newly developed timescale analysis method, which decomposes the image with each sub-band coefficient (i.e., distance resolution) to deal with the noise component, and the signal contains some high-frequency components. At the same time, it can effectively eliminate the low-frequency part. Moreover, wavelet transform is also a time-frequency local feature analysis method. It can extract the high-frequency components in time, space, and scale to more accurately analyze the local signal information. The feature extraction process of wavelet transform is shown in Fig. 6 [25]. 3) Feature extraction method based on Hilbert space method: The Hilbert space classification algorithm is based on the linear discriminant method of feature extraction, which is based on the traditional minimum distance classification method combined with a variety of statistical decision theories to achieve the recognition of different dimensions in the sample. The method predicts the target region by training sample data, however, due to a large amount of data and noise interference. Therefore, further improvements are needed: firstly, the original sample point information is multiplied and fitted with other discrete cosine matrices; secondly, the corresponding coordinate transformation coefficients are maximized or minimized according to the feature vectors of the classified objects, and finally, the linear discriminant method is used for parameter estimation, and the final output is combined with the Hilbert space classification method to realize the efficiency of feature extraction.

V. CONCLUSION
To sum up, with the development of computer technology, artificial intelligence has become the trend brought by the technological progress in the new era background. With the introduction of modern new technology, the traditional method of extracting music note features can no longer meet the needs of modern technological development. With the introduction of new technology, the traditional method of extracting musical features can no longer meet the needs of modern technology. The feature recognition method integrating partial differential equation and Hilbert space method has become the inevitable development of technology in the new era. In this paper, we propose a TV denoising model using partial differential equations, i.e., establishing its energy generalization, obtaining its Euler-Lagrange equation by variational method, and finding its numerical solution by gradient descent flow method, which can effectively protect the detailed information of music note features. In addition, when using the Hilbert space method for feature extraction and identification, the nonlinear and nonsmooth feature signals can be effectively identified. It is easier and faster to use the Hilbert space method for feature identification, and the information of the feature signal can be studied accurately.