Modeling and Interpretation of Covid-19 Infections Data at Peru through the Mitchell’s Criteria

In this paper, the criteria of Tom Mitchell based at the philosophy of Machine Learning have been used to interpret data of new cases per week of infections by Covid-19 at Perú. For this, it was constructed a mathematical scheme that encloses the Mitchell’s criteria as well as the idea of propagation as commonly used in modern physics to attack complex problems of interactions. With this, both the 2009 season of AH1N1 flu outbreak and the ongoing Covid-19 data were analyzed in terms of task, performance and experience. In contrast with the AH1N1 case, the Covid-19 data do not exhibit any performance in terms of minimize infections at the first weeks of the beginning of the outbreak, suggesting that precise actions to reduce infections have not been taken appropriately. Keywords—Covid-19; epidemiology; machine learning; Tom Mitchell; Monte Carlo


I. INTRODUCTION
Recently, the unexpected apparition of Corona Virus Disease (Covid-19 in short) [1] has reconfigured the current policies of public health of global operators, forcing them to apply the more robust schemes of recovering and surveillance in the shortest times without an optimal usage of resources: Times, materials anf human resources. Although to date, the first wave of pandemic is in most countries reaching its end, it is rather natural to ask about what we have learned from world-wide datasets.
In fact, as seen at all surveillance systems in all countries that are carrying out schemes of care, the understanding of data would exhibit imminent differences among them because the multicultural manifestations of societies as to face from the first moment the arrival of strain. It is also relevant the level of resilience of them for recovering as soon the conditions have shown a certain improvement.
This paper tries to answer the question: To what extent the schemes of machine learning seen as an universal computational tool can be useful to understand recent data of data from infections by  In order to answer that question, this paper has selected the Peruvian case that exhibits a remarked difference between current Covid-19 data and that of the 2009 AH1N1 season. Current data between March and July exhibit peaks and fluctuations, facts that would reinforce the hypothesis that in more cases (countries) the dynamics of spread and subsequent infections by Covid-19 appears to be strongly related to randomness. In this manner, this paper has assumed to priori that the time evolution of rate of infections is to some extent dictated by the rules that govern the propagation as commonly seen in physics and that was developed by Feynman [2]. In consequence one can postulate that the action of spread and infection by virus follows the mathematical structure of a propagator integral than can be written down as: with G(t 2 − t 1 ) the causal Green's functions in the sense that t 2 > t 1 and that plays the role as mathematical mechanism supporting the transition from the state 1 to 2. In addition H(t 1 ) the input function. Although in its original formulation, the physical propagation contains dependence on the spacetime, at a first instance one can test this integration as a mathematical rule that engages the time evolution of current infections in large cities. Under the assumption that it is actually the tool that dictates the strain spread then any variation of kernel might be advantageous as to manage the rate of infections. Thus, under the scenario of Eq.(1) is applicable to the ongoing problem intercontinental infection, then the human intervention for alleviating the outbreak by Corona virus can be modeled through the kernel's free parameters.
Once the problem of spread and infection is modeled through the propagator theory, this work has opted by the philosophy of Machine Learning in order to translate the language of dataset in terms of the view of Tom Mitchell [3] that states that all system can be universally described by actions, (i) task, (ii) performance, and (iii) experience. In this manner one can use this methodology to extract information from any statistical dataset, such as the ones recently have been taken due to the Covid-19 pandemic. The robustness of Machine Learning can also be used to carry out comparisons with previous pandemics such as the 2009 AH1N1 [4] in order to find similarities or discrepancies as to the employed schemes that have been applied to optimize the actions taken by the public health systems. Although in principle one can claim that both AH1N1 and Covid-19 might no be associated each other from any angle of analysis, from the applied methodology in this paper, a noteworthy association between AH1N1 and Covid-19 suggests a possible link between the rate of infections and the public health policies that would determine the success about the management of a city or country in periods of crisis created by pandemic. In second section, the theoretical proposal based in the implementation of Green's functions and the possibility of a kind of entropy is presented. Here, the Mitchell's criteria are introduced in a mathematical manner. In third section, once the theory is build, then the applications of it projected onto the AH1N1 2009 Peruvian season and subsequently in the current 2020 Covid-19 Peruvian data is done. Therefore, the Machine Learning interpretation is done. Finally the conclusion of paper is presented.

A. The Concept of Propagator and Green Function
In physics, the propagation between two space-time points is dictated by evolution operators that entirely depends on the dynamics and physical observables of system [5] [6]. Therefore the action of propagation must have a tangible cause, any action that produce changes to the system. For example Eq. (1) can extended from the time t 1 to t 3 as: by which the Green's functions G(t 3 − t 2 ) and G(t 2 − t 1 ) make possible the time propagation along the times from t 1 to t 3 passing through t 2 by which in that time it was caused the last propagation. From this one can generalize for a large number L of propagations as written below: While P 1→L (t L ) encloses a chain of time propagation, it is perceived as a probability of a system undergoing a transition between the times t 1 to t L . Indeed one can assign to H(t ) the role of input function that is convoluted with the propagators. Actually, one has L − 2 input function. It should be noted that the case of L = 3 gives Eq.(2).
Consider for example a Gaussian profile that models the time propagation and its respective input function depending on the constant τ , so that one can write down that: where the change u = t +1 − t and dt = dt was used. With the definition for example the input function can be written as: for the sake of simplicity one opts the assumption that all h = h and τ = τ have same value, that also means that the interactions of system have not effect along a complete cycle of interactions, so that the system has same chance to keep its initial state along the subsequent interactions. Thus one can write down: This naive result is illustrated in Fig. 1 up to for 4 values of L.
Due to the Lorentzian nature, all peaks are centered in a same value. The amplitudes have been varied with the incorporation of the constant (1.5) L that multiplies Eq. (6).
The combination of the Gaussian and Lorentzian profiles can be combined in the sense that both can yield an approximated quantitative description of the evolution of a limited period of pandemic [7][8][9] [10]. In this manner with is rewritten as: yielding the distributions as shown in Fig. 2. The color blacks arrows are indicating the decreasing of peaks in time. The fact that all peaks lost their initial value as indicated by the blue arrow in the first peak, is due to any action that in this case is due to the inclusion of term 0.85*(0.2) L that is associated to the term h L as given in Eq. (5) describes the deterioration of its amplitude along the different times where system experiences interaction.
For instance, one can assume that the curves denote the probability of having a certain number of infections or known as the rate of infection by time units. Thus, in this toy theory: the incorporation of 0.85 * (0.2) L can be interpreted as the decreasing of rate of infections imposed by the initial conditions of system. In fact, the positions where the arrows have been located would denote that of the times by which a decision has been imposed such as quarantine, curfew or social distance. Therefore, the model yielding peaked distributions has emerged as one that can be seen a methodology to describe for example rate of infections once an outbreak has been confirmed. Thus, it is possible to define the number of infections as the product N = n 0 P with n 0 the initial number of identified infections. So that the task is to reduce this number through concrete actions in according to the available technology that each public health operator manages in the affected countries. In praxis, N would depend on a set of free parameters that features the intensity of pandemic such as population, human behavior and capacity to carry out the social rules after lethality of strain is identified.

B. Pandemic as Entropy
From Eq. (3) the chain of propagators that introduce the concept of risk of pandemic can also be seen as a kind of Shannon's entropy. In fact, consider that www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 under the assumption that the time differences are same and equals to ∆t for all the propagation in the sense that (that can be perceived as a fast propagation of strain once the outbreak has initialized). In this manner Eq. (9) can be rewritten as: and by applying the logarithm, then the Shannon's entropy is given by: indicating that the entropy is only contributed by the propagators whereas the logarithm of integration over the input functions turns out to be δS = Log[ [H(t )] dt ] the error of entropy. In the side of Epidemiology, Eq. (12) can also be interpreted as the disorder of the transportation mechanism of strain [11] that is dictated by nonlinearities [12] that leads to a kind of anarchy of system that actually would exhibit any city or place that faces the arrival of a virus causing the social [13] and economic disorder to some extent [14]. In order to go through Eq. (12) the input function is assumed to be a polynomial distribution so that a power series is applied, thus one gets with t → u, the entropy error is given by: From this one can verify if this error is also an entropy. The Shannon's entropy associated to this is then written down as: exhibiting that the two first terms of right side of Eq. (14) follows the structure of a Shannon's entropy. It should be noted that the term ( Cm m! ) can be perceived as a kind of probability. For large values of integer m this probability turns out to be null. In this way there is a set of values for mL and u that cancels the term M m L Log Cm m! so that one obtains an entropy to be null. Clearly it demands to find the best values of C m , M , L, and u, by which it would be the task of Machine Learning algorithm.

C. Theoretical Formulation of Mitchell's Criteria
As mentioned above, the philosophy of Machine Learning action [14], [15] can be resumed in the criteria postulated by Tom Mitchell [16] by which is assumed that the system has a (i) task to be done, such task demands to apply a (ii) performance that targets to optimize the system's parameters. After of carrying out the performance, the system acquires (iii) experience as to the obtained results.
Here one states the main argument of this paper by which is claimed that the probabilistic character of rate of infections as written in Eq.(3) can be translated in terms of the Mitchell's criteria. In fact from Eq. (3), it is feasible to formulate the following algorithm based on the Mitchell's criteria [17]. Consider the number of infections at the th time as N (t L+1 ) = n 0 N(t L+1 ), so that Eq. (3) can be modified to: Clearly it was assumed that N(t L+1 ) = G(t +1 −t )N(t )dt in where N experiences a variation from the time t to t L+1 , through L interactions producing the subsequent propagation.
Thus, the task consists in to reduce the number of infections through L different periods of pandemic evolution. It is noteworthy that artificial intervention to the outbreak evolution can be applied in order to counteract the progress of pandemic, and therefore to minimize the infections.
In this manner, the minimization of N (t L+1 ) is a genuine obligation of system [18][19] [20]. To accomplish this, one requires the best strategy that in the picture of Tom Mitchell is known as the performance.
It requires to postulate the best representation of Green's function as seen in Eq. (3). It implies to find the best value for the integer number L. Once that this number have been determined, for example L < L the one proceeds with the integrations. It is suitable to implement the Monte Carlo step that makes the decision of obtain an optimal and reduced number of infections. Thus, in the case that it is not in accordance to the desired number of infections i.e: N (t L ) < N (t L −1 ) then it is opted that L → L + 1 to verify that there is a reduced number of infections. Thus, the action is considered as long as N (t L +1 ) > N (t L ). In this manner, the process is stopped when it is verified the condition given an integer number n that verifies L − n < n < L + 1 then: for example if for t L+1 a pandemics yields 100 infections for a period of 10 days, then one expects 1000 for 100 days. Then, Of course, for times > t L+1 one might to expect a certain nonlinearity since the rate of infections is partially governed by randomness more than deterministic laws [21].
Thus, once the path have been identified, the one can reconstruct the Green's functions of system, the one that the system has opted to yield a certain number of infections. Clearly, this reconstruction demands to know the involved free parameters and other unknown quantities that could not have been visible at the beginning of a pandemic. A crude estimate to reconstruct the Green's functions is done through the confrontation of data that displays the number of infections per unit of time. Thus one can write below the relations between data and the Green's function of system as: where one can apply a fitting to the acquired data that is in essence the left side of this equation. Thus, the reconstruction of the product of propagators would depend entirely on the quality of fitting expressed in terms of χ 2 /d.o.f. One should note that after the fitting is done the Green's function can be written as:

A. The Peruvian 2009 AH1N1 Season
The so-called pig-flu strain [22][23] [24] had its apparition in Perú [25] along the first week of May being through a people whom have been abroad. The infections started over Lima city being this the main place of strong spreading as seen in Fig. 3. In effect, infections reached its peak on June 20th at Lima city. Due to the outbreak, social measurements were imposed on people in order to block the strain mobility to avoid spreading in vulnerable population. Such social restrictions had an interesting effect as noted at the apparition of a secondary peak on July 2th. Clearly from this date the Lima's infections shown a descent behavior that can be associated to the social regulations. On the other side, data also exhibits for the rest of Provinces a first peak ion July 10th. Clearly one can ask about the why both distributions are not superimposed each other. Clearly data reveals us that the fast up of infections in Provinces is not in phase with Lima city due to human mobility that might be nonlinear. The why Lima city exhibit more infections might be entirely related to the total population. Thus, there is certain probability of a conjunction of external variables that would give the rise to the gap of the peaks of cases between Lima city and provinces.

B. The Machine Learning Parameters
Eq. (7) is used to validate the Mitchell's parameters on the data of Fig. 3. Thus, it is assumed that Lima data of AH1N1 is a sum of up two different distributions. Therefore the law that models th infections is given N = nP 1→2 . To accomplish this, it was applied the change given by 0.85(0.2) L → 850×(0.07) L . The denominator has passed of Thus, in the scenario of Machine Learning the quantity 850 denotes the expected number for a period of 10 weeks. By using the Mitchell's criteria the task consists in to reduce the first peak [26] that in turn it is equivalent to impose social restrictions that minimize the human contact . The performance is then focused on the different methods that would reduce the infections. It should be noted that during the AH1N1 pandemic in Perú, not any quarantine neither curfew was applied. Instead of that the closing of social activities was done. Mathematically speaking while the task is modeled by a Lorentzian distribution under a dependence on the term (x−(4+3×L)) 2 that exhibits the first two peaks indicating that after any action the second peak becomes reduced in its height, in conjunction to this, one expects the effect of the numerator in Eq.(7) to minimize the infections. In addition it should be noted that the term (0.2) L plays a critic role. A slight variation of value 0.2+δ with δ a small number << 1 yields abrupt changes at the morphology of spectrum. Thus for the present study (0.07) L governs the behavior of Fig. 3. One can see that it is strongly correlated to the resulting integration of Gauss profiles. In fact the term (0.07) L affects directly the value of x L/2 encompassing the role of propagators whose role here is that of minimize the first peak. In this manner, being the task modeled a Lorentzian distribution then one can anticipate available dates by which one expects the decreasing of infections.
In this way the experience is given by the second peak, after a period of decisions by the local public health operators. Fig. 4 indicates the experience on the 9th week after implementation of Mitchell's criteria between 5th and 8th week. The arrow between the task and experience would denote the performance that is translated in the social restrictions to avoid the strain propagation in people. In this way, the management of AH1N1 pandemic in Lima city might be seen as efficient as well as sustainable to minimize the effects of the strain arrival.

C. The Peruvian Covid-19 Pandemic
The current Covid-19 have assaulted in an unexpected manner the world-wide public health schemes, being to date Perú (date of submitting this paper) [28][29] [30] as one of the more affected as to the number of new cases per day. In fact, in Fig. 5, the morphological composition displays up to phases being the first between the 1th and 13th week, and a second phase for the remaining data. While country government has dictated social restriction such as quarantine and curfew, even under this, the number of cases has been increasing in an unstoppable manner as seen at histogram. In fact, despite of the fact that social distancing and face protection were imposed, one can see that the new cases per week have shown a rapid growth of up to a 61% approximately (with respect to the 27686 cases of week 12th) as seen the jump from the 12th to the 13th week. It is noteworthy that it is perceived as the peak of first wave. Beyond of this, data exhibits a morphology that can be understood as the beginning of a a second distribution as the consequence of all actions that were imposed before or after 13th week. Under the assumption that first peak has a substantial contribution from Lima city as reported by official data, then is feasible to state that a second distribution is due to cases from provinces. Under this view and by comparing to AH1N1 2009 season data then one can see that human mobility might be the main cause of the formation of second distribution. Thus, one can argue that infections were transported from Lima to provinces through actions of mobility as seen in the opening dates of the travels: July 1th (terrestrial) and July 15th (aerial). One can anticipate in a scenario of Mitchell's criteria that the performance to contain the infections could has been to some extent inaccurate.

D. The Machine Learning Interpretation
Again, the Mitchell's criteria are applied to interpret the histogram of Fig. 5. Because the imminent presence of two well-define distributions having a similarity among them, one can see that the entire system do not exhibits not any flatness as require to claim the end of a first wave. Thus, it is fair to claim that while a possible first wave was ending on Lima city, the formation of a second one essentially due to new cases coming from provinces have been manifesting before the peak at the 13th week. In this way, the first peak is perceived as the task of system to reduce it. However it is clear the the superposition from provinces might add a kind of bias to data. Once the task has been identified one can pass to apply a strategy or performance whose target is to decrease the peak on the subsequent weeks. The performance as modeled by a Gaussian profile appears to be rather limited as to provoke a fast decreasing of the number of new infections. In fact, in 50+(x−(3+12 * L)) 2 was used, with N the new cases per week and n the expected number of infections. In this manner, the new infections n becomes a free parameter of system. Although performance is not identified in data, from Fig. 5 one can see that the performance is not visible as a tangible action that has caused variations on the curve. An argument of the why performance cannot be identified on the data is because the superposition of Lima and provinces data that would generate a kind of unrecognizable noise that would affect the national data. Because of this, performance turns out to be a constant and are modeled by a Gaussian profile containing a width that is randomly fixed. It implies that τ → β a constant. Thus, one has that ∞ 0 Exp − t +1 −t τ 2 dt = √ βπ. Thus experience acquires same morphological shapes from task. This is seen in Fig.  6 where task and experience are modeled by two Lorentzian shapes separated by a gap of 18 weeks approximately.

IV. DISCUSSION AND CONCLUSION
The fact that N COVID does not exhibits the Mitchell's performance but encompasses to some extent to Fig. 5, then it is interpreted as follows: Performance was applied before the first peak of 13th week, so that it could has been broken as effect of the end of national quarantine as well as the stopping of curfew at the noon. On the other side, while the apparition of peaks can also be seen as the resulting outputs after the implementation of imposed actions on the subsequent weeks once the strain was recognized, at the language of Mitchell's criteria, task as a focused fact is not reflected from data. In this manner, inputs Lorentzian distributions were not affected by a constant propagation in contrast to the AH1N1 by which the Mitchell's criteria fits well to data. Indeed a constant performance in terms of Feynman propagator is interpreted as the system has not any scheme (or strategy) to experience variation in time. Thus, one can argue that for the ongoing Covid-19 pandemic in Perú, its translation in terms of Machine Learning could not involve the action of performance, a crucial step to manage the system evolution. While the mathematical probability as given by Eq. (7) was calculated through the usage of a Gaussian profile that models the propagation, then with this one can conclude that the apparition of a second peak is due to a very limited and almost invisible performance. Here, one can ask about the usage of a different propagator distribution. However, it would demand to introduce a set of free parameters that might not be fitted to data, so that it put apart the Mitchell's criteria far from realistic interpretation that must be adjusted to the ongoing acquired data.