Efficient Function Integration and a Case Study with Gompertz Functions for Covid-19 Waves

—Numerical algorithms are widely used in different applications, therefore, the execution time of the functions involved in numerical algorithms is important, and, in some cases, decisive, for example, in machine learning algorithms. Given a finite set of independent functions A(x), B(x), ..., Z(x) with domains defined by disjoint, consecutive, and not necessarily adjacent intervals, the main goal is to integrate into a single function F(x) = k1×A(x) + k2×B(x) + … + kn×Z(x), where each activation coefficient k, is one if x is in the interval of the respective domain and zero otherwise. The novelty of this work is the presentation and formal demonstration of two general forms of integration of functions in a single function: The first is the mathematical version and the second is the computational version (with the AND function at the bit level), which is characterized by its efficiency. The result is applied in a case study (Peru), where two regression functions were obtained that integrate all the waves of Covid-19, that is, the epidemic curve of the variable global number of deaths/infected per day, the adjustment provided a highly statistically significant measure of correlation, a Pearson's product-moment correlation of 0.96 and 0.98 respectively. Finally, the size of the epidemic was projected for the next 30 days.


I. INTRODUCTION
Numerical algorithms are important in different applications, they are made up of loops/iterations that contain functions, for example, the cost function in machine learning algorithms.
On the other hand, on some occasions, the integration of functions is necessary, that is, the union of independent functions G 1 (x), G 2 (x), …, G n (x), each one in different domains, that is: Note: Domains are defined by disjoint, consecutive, and not necessarily adjacent intervals: Therefore, the objective is to present new procedures for integrating functions into a single function F(x) = k 1 ×G 1 (x) + k 2 ×G 2 (x) + … + k n ×G n (x), where k i , with 1 ≤ i ≤ n, is the activation coefficient, that is, it is one if x is in [ ] and zero otherwise.
What follows is to apply the results in a case study, that is, the integration in a single function of the different functions that represent the waves of the coronavirus disease . This emergency situation has made it a very important research topic in the entire scientific community [1]- [3].
The Covid-19 has caused deaths and infections since it began. The countries are going through the third and fourth waves and it is not known if others are coming, so it is necessary to build a general regression function for an unlimited number of waves.
The Gompertz model represents sigmoidal behaviour and is suitable for representing the spread of Covid-19. Epidemiologists, biologists, and others use this model for its advantages. There is a detailed review of the Gompertz model in [4]. Then, the study focuses on studying and understanding the global number of deaths/confirmed accumulated, applying the Gompertz model for several waves, that is, an integrated regression function with Gompertz functions (G 1 (x), G 2 (x), …) for each wave and the prediction of future behaviours. Predictions are important for decision-making in the political, economic, and other fields [5].
Artificial intelligence and its machine learning methods have been applied in different areas and better results have been obtained than traditional methods such as the traditional regression with the normal equation [6]- [11]. In our work, a program has been developed to carry out regression using machine learning, as a case study the country of Peru was selected, which has been one of the countries with the highest mortality rate per inhabitant.
The main part of the research is made up of theoretical results, which are permanent, because mathematical demonstration is used. But, in the application part, which has the purpose of highlighting the importance and usefulness of the results, it is limited to a case study (Peru).
Finally, the results obtained can be applied: In different functions without restriction (for example, in epidemiological functions: Logistics, Bertalanffy, Boltzmann, etc. or a combination thereof), in any data set (for example, Covid-19 www.ijacsa.thesai.org data from other countries), in an unlimited number of functions (e.g., in various waves of  and in general in all applications that require the integration of functions.
The rest of this research is organized as follows: Section II reviews the related work. Section III explains the methodology. Section IV describes the principal results, while Section V discusses the case study, and applications of the proposed results. Lastly Section VI summarizes the main conclusions of this work.

II. RELATED WORK
Aferni et al. used a basic way of integrating two functions for two consecutive waves of Covid-19, the authors do not present a general way to integrate functions; in addition, it is not possible to generalize the result for more than two functions. In the integrated function F(x), values of zero and one are used for p and the sigmoidal-Boltzmann mathematical model was applied to study the Covid-19 spread in 15 different countries [4].
As far as we know, there is no other work related to the integration of functions. On the other hand, there are several research works regarding the spread of Covid-19 [12]- [14].

III. METHODOLOGY
Mathematical proof is the primary form of justification for mathematical knowledge [15]. It is a formal and rigorous method, its validity is permanent, that is, it remains forever. It has no margin of error and is not subject to the assumptions of statistical methods. For this reason, it was used in the first and main part of the investigation.
In order to highlight the importance and usefulness of the theoretical results, in the second part, a regression function composed of several Gompertz functions was built, specifically, for the Covid-19 data (Peru). Linear regression was used with a correlational hypothesis. Linear regression is still a useful and widely used statistical method [16].

A. Data
The data used was the global number of cumulative deaths/infections, which is freely downloadable from the Johns Hopkins University Resource Center repositories from day one to June 27, 2022.

B. Inflection Points and Second Derivative
Let ( ) be a function, which is continuous at a point , ( ) can have a finite or infinite derivative at that point. If, when passing through , the function changes the direction of convexity, then is called a point of inflection [17].
Second derivative [17], [18] of a function ( ). If is a point of inflection, and the function has a second derivative in some neighborhood of , which is continuous at the point itself, then, ( ) .
For example, in the case of Italy, the start of the second wave (inflection point) was calculated with the following procedure: First, a third degree polynomial regression was performed between day 75 and day 250 because it is the interval where the change point is found (in Fig. 1 the regression function is illustrated in blue), second, the inflection point is calculated using the second derivative, finally a red vertical line was drawn to highlight the start day of the second wave. The cubic regression function is given by (R statistical software was used): ( ) The inflection point was obtained by calculating the second derivative and solving the equation ( ) , it is given by (the free Maxima software was used): Then, nonlinear regression can be performed using the Gompertz function or another for each wave.

C. The Gompertz Model
The Gompertz curve/function/model is a function for a time series, named after Benjamin Gompertz (1779-1865), who was born in the City of London) [19]- [22] . It is a special case of the generalized logistic function. In [23] the Gompertz function is described, classified, and explained, it has different variants, the best known is the following: ( ) Also represented with the exp function as follows: Where, G(x) is the expected value (e.g., deaths) as a function of time x (for example days since the first case), a is the upper asymptote, b sets the displacement along the x-axis (translates the graph to the left or right), c is the growth-rate coefficient (which affects the slope), e is Euler's Number (e = 2.718281828459045), and exp(x) is e x .

D. Fit Assessment Measures
The evaluation measures, known as goodness of fit measures that will be used in the research are the Pearson correlation coefficient (R) and the determination coefficient

IV. RESULTS
To integrate several Gompertz functions (one for each wave) in a single formula, first the inflection points are calculated, then the regression is carried out to obtain the Gompertz (or other) functions and it ends up integrating the functions in a single formula. With this result, forecasts can be made.

A. Calculation of Inflection Points
It is achieved by performing the cubic linear regression (third degree polynomial) in the respective intervals, then using the second derivative of these functions, the inflection points are obtained, which can be interpreted as the day on which a wave begins (after a wave as seen in Fig. 1 and Fig. 2). In Peru, the first inflection point is day 253 and second is day 596.

B. Integration of Functions
Let's start with the basic case of the union of two functions that represent two successive waves (of deaths or another accumulated variable), for illustrative purposes the Gompertz function will be used, although the results of this section are general, that is, can be applied to other functions (e.g., the Boltzmann function).
Let G 1 (x) y G 2 (x), two Gompertz functions (without losing generality), which correspond to two successive waves, be integrated into a single function by adding a characteristic/variable in the database called p, which indicates with a single distinctive number all the rows (days) that correspond to a particular wave. The first wave is assigned a number (e.g., one), the second wave is marked with another number (e.g., two).
The data is structured as shown in Table I. In the case of Italy (as also seen in Fig. 1), from day 1 to day 175 belong to the first wave and from day 176 onward to the second wave. (Basic study for Italy that does not include integrated waves is found in [24] and [25]). Then, the integrated regression function is given by: In this way, if in F(x), p is replaced with 1, G 1 (x) is activated and G 2 (x) is eliminated, conversely, if p is 2, G 2 (x) is activated. The function F(x) can be simplified if values of zero and one are used for p as was done in [4], however, it is not useful for generalizing over functions.
General case mathematical version, if several functions are considered, whose quantity is specified with the value of n, proceed as follows: Let G 1 (x), G 2 (x), … , G n (x) be functions, which correspond to n successive waves, are integrated into a single function, controlling the activation through a coefficient C p as follows: F(x) = C 1 ×G 1 (x) + C 2 ×G 2 (x) + … + C n ×G n (x) adding a feature (variable p) to the training samples, as the basic case (e.g. successive numbers 1, 2, 3, ... , p).
A table of coefficients is constructed to discriminate functions. Then, it is generalized with a new general formula for the coefficients. Then, the formula for the coefficients of the sequence of functions according to the parameter p, is given by: Where, p is the number of the function or wave, C p is the coefficient for the function G p (x), n is the total number of functions to integrate.
Proof. Let the function be Gp(x), its coefficient is calculated as follows, the multiplication of (n-1) factors is required except the one corresponding to row p (to activate the function in question), the factors are described like column Gp(x) in Table II, it is achieved with two products ∏ ( ) and ∏ ( ) for factors above and below p respectively. Replacing p in the factors generates two factorials that offset by dividing (p-1)! and (n-p)! generated by the product of the top and bottom numbers. Finally, divide by ( ) to nullify the negative result that occurs when p is even.
Therefore, the final function is given by: The advantage of this formula (mathematical version) is that it does not depend on a programming language, compiler, or binary representation of the numbers, but the disadvantage is the execution time required to calculate each coefficient, which is Θ(n), where n is the number of waves or functions, www.ijacsa.thesai.org specifically among others requires (n+1) multiplications and one division.
For example, if you have three waves, the coefficient for the first function G 1 (x) is given by: General case, the computational version (with function at the bit level). It is possible to perform the integration by a simpler method, that is, using bitwise operations, specifically using the bitwise AND (&) operator. Unlike the previous method, the p values for each group (function) must be recorded in powers of two starting from one, that is, 1, 2, 4, 8, 16, ..., : Where, the & operator represents the bitwise AND operator.
Proof. Given the conditions, the result of is equal to , if and only if both operands have the same value (by the definition of the AND operation). Then only one division is required to get the 1, which finally selects the function.
The method is simple and efficient, the execution time to calculate each coefficient is constant, but it requires the AND function at the bit level. In the statistical software R, it has the bitwAnd(a,b) function, which, according to the notation considered, calculates a&b. An empirical comparison of the performance of the two methods is not necessary, the difference is obvious.

A. Case Study Analysis of the Global Number of Deaths from Covid-19 in Peru
To apply the procedure described in this work, the regression function of the global number of deaths from Covid-19 in Peru was analysed using three Gompertz functions for each wave. Day one corresponds to the first case of death that occurred on March 3, 2020, and the time series extends until June 27, 2022 (which makes up a total of 844 days). Fig. 2 shows the observed data (in green), the Gompertz1 function for the first wave (blue), the Gompertz2 function for the second wave (red), and the Gompertz3 function for the third wave (black).
To calculate the day on which the first wave ends and the day on which the second wave begins (with a growth in the number of deaths), a third-degree simple polynomial regression was carried out, to this formula the second derivative was applied as shown described in the methods section (for the purpose of finding the inflection point). On day 253 the first wave ends and on day 254 the second wave begins. The same procedure was carried out to calculate the second inflection point.  This function is the one observed in Fig. 2, each wave with a different colour. The Gompertz model adjusted to the series of the accumulated number of deceased, reports a Pearson correlation coefficient R = 0.9577994 and an explained variance of 91.73797%, quite acceptable measurements of the adjustment made. The alternative hypothesis is accepted: the correlation is not equal to zero (t = 96.691, df = 842, p-value < 2.2e-16).

b) Prediction:
The prediction was made for 30 days, in Fig. 3 you can see the projection that corresponds to days greater than 844 (green). In Fig. 3, and particularly in the predicted data, a small increase is observed, so it is necessary to take measures.

B. Analysis of the Global Infected Number -Peru
The regression function of the global infected number (Covid-19 in Peru) was analysed using three Gompertz functions for each wave. Day one corresponds to the first infected case (March 3, 2020), and the time series extends until June 27, 2022 (which makes up a total of 844 days). Fig. 4 shows the observed data (in green), the Gompertz1 function for the first wave (blue), the Gompertz2 function for the second wave (red), and the Gompertz3 function for the third wave (black).

a) Regression Function:
This function is the one observed in Fig. 4, each wave with a different colour. The Gompertz model adjusted to the series of the accumulated number of infected, reports a Pearson's product-moment correlation R = 0.9814708 and an explained variance of 91.73797%, quite acceptable measurements of the adjustment made. The alternative hypothesis is accepted: the correlation is not equal to zero (t = 148.63, df = 842, p-value < 2.2e-16).
In this case of the number of infections, no predictions are made because the last Gompertz curve G3(x) does not fit adequately to obtain consistent results. Finally, it can be said that the function integration method worked well in both cases (deaths and confirmed). The contribution of this research is the presentation and illustration of two ways of integrating n regression functions (which may correspond to n waves of covid-19 or others). The first is the mathematical version, independent of devices such as the binary representation in the computer, and the second one is the computational version that has the advantage of being simple and efficient in time, specifically, in the calculation of the coefficients (with a constant time complexity). These results are general in relation to the cited literature and have many applications.
The epidemic curve of the number of deaths/infections was obtained with three Gompertz models integrated into one function, the adjustment provided a correlation measure that is statistically quite reliable, so forecasts were obtained for 30 days, that is, for the month of July 2022, it is concluded that the model fits the data well and is good for forecasting. And, given the slight outbreak, it is necessary to follow preventive measures to prevent the spread of Covid-19 (with specific emphasis on Peru).
Finally, the detailed explanation and interpretation of the linear regression function and the Gompertz functions that compose it, are useful to describe and compare the waves of Covid-19, however, they go beyond the objectives of the research.