Clustering of Multidimensional Objects in the Formation of Personalized Diets

When developing personalized diets (personalized nutrition) it is necessary to take into account individual physiological nutritional needs of the body associated with the presence of gene polymorphism among consumers. This greatly complicates the development of rations and increases their cost. A methodology for the formation of target diets based on the multidimensional objects clustering method has been proposed. Clustering in the experimental group was carried out on the basis of a calculation of the integral assessment of reliable risks of developing decease conditions according to selected metabolic processes. And genetic data of participants was taken into account. The use of the proposed method allowed reducing the needed number of typical solutions of individual diets for the experimental group from 10 to 3. Keywords—Multidimensional objects clustering method; integral assessment of reliable risks; nutritional needs of the body; personalized nutrition


I. INTRODUCTION
Modern studies of the human genome have allowed the identification of many genes responsible for metabolic processes, whose polymorphism plays a significant role in the occurrence of metabolic disorders and the development of diseases.Identifying the alleles of such genes that are present in humans helps to determine the risk factors of particular health disorders and to develop optimal measures that will prevent the negative influence of environmental factors on the implementation of genetically determined disorders [1].
One of decisive factors determining the diet is the human genome [2].Today, a reliable statistical relationship has been established between the presence of certain varieties (alleles) of fixed genes in relation to susceptibility to more than 150 hereditary diseases [3].The process of occurrence of a disease may be associated with disruptions in the functioning of individual organs and systems and be a consequence of a violation of nutritional status, which does not take into account the peculiarities of the genetic influence on the nutrient needs of the body.Thus, food products and food rations, designed to meet the corrected needs for food nutrients that take into account genetic characteristics of a particular organism, automatically prevent the adverse functioning of problem organs and systems [4][5][6].
The use of statistical methods for analyzing medical information is currently relevant.With the development of technology, the sphere of their use is expanding and includes the methods of information processing called Data Mining.
One of the main effective and widely used methods of Data Mining in relation to large amounts of information is a clustering method.The point of the method is in searching signs of similarity between objects in a particular subject area www.ijacsa.thesai.organd the subsequent merging of objects into subsets (clusters) according to established signs of similarity.
Data mining contains methods of detection, data collection, as well as its intellectual analysis.Data Mining is a multidisciplinary field that emerged and develops on the basis of such sciences as applied statistics, pattern recognition, artificial intelligence, database theory, etc.
This study examined the effect of a limited list of gene panels on metabolic processes with the calculation of the integral assessment of reliable risks for the development of disease conditions, and also proposed the application of the multidimensional objects clustering method in order to form diets for target groups of consumers.
The task of clustering is due to the fact that in the case of mass (industrial) formation of rations, the problem of finding typical solutions arises.These solutions should be made for target groups of consumers assigned to a particular cluster.It should be noted that clusters themselves are unknown in advance.Therefore, in order to accumulate information about clusters during scientific research, the clustering problem is solved and the method of their formation is worked out [7,8].

II. RESEARCH METHODOLOGY
The group included people of European type (men and women), about the same age (28-35 years old), born and living in several generations in the region of Central Russian upland.The polymorphisms of genes involved in the main metabolic processes and causing the risk of occurrence of certain diseases were selected as the most significant ones: biotransformation of xenobiotics, metabolism of vitamins, assessment of psychoemotional status.
Table 1 lists the controlled alleles of genes and corresponding risks of hereditary multifactorial diseases for the mentioned above metabolic processes.
Biotransformation of xenobiotics is a biochemical process during which substances transform under the action of various enzymes of the body [9][10][11][12][13][14][15][16][17].Its biological meaning is the transformation of a chemical substance into a form suitable for removal from the body.Four genes of the activation phase of xenobiotics (CYP1A1*2B, *4, CYP2D6*3, *4, CYP2C9*2, *3 and CYP2C19*2) and four genes of the detoxification phase (GSTT1, GSTM1, NAT2 and TPMT) were included in the biotransformation panel under study.To assess the vitamin status of the organism, marker genes that indicate risks of reducing the concentration of vitamins in the organism of the genome carrier (NBPF3 (ALPL), FUT2, BCMO1, APOA5) were studied [18,19].To assess the psychoemotional status of participants in the experimental group, gene activities (DRD-2A, SR (HTR2A)) responsible for the synthesis of serotonin and dopamine enzymes were also identified [20,21].
These gene panels are associated with a predisposition to a number of most common diseases and are included in the list of genetic tests of most medico-genetic laboratories.

Metabolic process encoded by a group of genes
The The formation of addiction to alcohol and narcotic substances due to a deficiency in the synthesis of serotonin and dopamine.

SR(HTR2A)
Associated with increased risk of paranoid schizophrenia www.ijacsa.thesai.orgExperiment participants were assigned reference numbers from 1 to 10. Testing was performed by analyzing saliva using the micronucleus test of the buccal epithelium.As a result of testing, data on the presence of polymorphisms in the homozygous safe (C / C), heterozygous (C / A) or homozygous predisposing to the disease (A / A) forms in the studied genes was obtained.For the ease of processing the experimental data the presence of polymorphism in the homozygous form predisposing to the disease was indicated by a score of 2 points, in the heterozygous form by a score of 1 point, and the homozygous safe form by a score of 0 points.Table 2 shows information on the presence of polymorphisms in genes tested in the experiment or their alleles in one form or another.
Table 2 shows the individual and integral assessment of reliable risks of expression of genes and their alleles tested in the experiment.This table is compiled in the form of a matrix.The sum of points accumulated by each participant on the studied gene alleles expressed an integral risk assessment for each participant in the experimental group (ranging from 0 to 30).Summary line of the sum of risks for each group member given in Table 2 allows to give an integrated risk assessment of diseases of the whole spectrum of diseases determined by considered gene panels.
Mathematical data processing was performed using soft calculations, namely clustering of multidimensional objects [22][23][24][25][26][27].Mp(i)-the weight of importance of risk at the i-th gene condition, 1,.. M X(n, i)-risk assessment in points in accordance with condition of the i-th gene in object n, n=1, ..N, i =1,..M, ni The initial set C 0 must be divided into sets of clusters C k : Any pair of clusters has no common elements, that is, any object can only be in one cluster; 0 0 , : It is required to determine such С k that maximize the criterion U: Where U(K o ) is the optimal value of the clustering quality criterion; U 1 (K) -compactness of classes with K clusters; U 2 (K) is a measure of similarity of classes with K clusters.
The measure of similarity between two objects is determined on the basis of the potential function f (S i , S j ): where K is the number of classes at the current classification step; N k -the number of objects in the class C k ; f (S i , S j ) -potential function of two objects S i and S j ; (S i , S j ) -the distance between objects S i and S j in the space of characteristics X, taking the metric into account 1 ( ) ( ) Thus, optimal splitting into clusters implies maximizing the criterion U(K о ) (see formula 5).Substantially such a statement means that in each cluster related objects are collected, and between objects of different clusters there are significant differences.This problem is related to soft computing problems class solved by the methods of integer mathematical programming.To solve the problem a set of programs for assessing the quality of multidimensional objects was used [9].

IV. RESULTS AND DISCUSSION
When solving the clustering problem on the example of the study group, four metabolic processes were distinguished: biotransformation of xenobiotics-activation phase (process number 1); biotransformation of xenobiotics-detoxification phase (process number 2); metabolism of vitamins (process number 3); assessment of psychoemotional status (process number 4).
Each process is encoded by several genes (from two genes in the psycho-emotional status, up to four in each of other processes).A possible condition for the clustering of participants is the presence of approximately the same total number of points within each process and, accordingly, close values of integral assessments of reliable risks for the amplification of disease states on selected metabolic processes.
The problem of combining objects into clusters based on the data from Table 3 was solved according to the condition that integral assessments of reliable risks in processes differ by no more than 25% among the participants of one cluster.
The results of solving the clustering problem are given in Table 4.
Table 4 shows that the number of individual decisions for which specialized menus should be made reduced from 10 to 3. That is participants numbered 9, 4, 2, 7 and 5 are assigned to cluster 2 (integral risk is in the range of 0.60 to 0.71), participants 3, 10, 6 and 1 are assigned to cluster 3 (integral risk is in the range of 0.76 to 0.84).Participant 8 is assigned to an independent cluster 1 (integral risk-0.46).
Table 4 also provides information on the integral risk in the form of a conditional value from 0 to 1 for each member of the group, where zero corresponds to the presence of polymorphisms in the homozygous safe form in all 14 genes in the alleles under study, and 1 corresponds to the presence of polymorphisms in the homozygous form that predisposes a disease in each of the 14 genes.
Using intelligent data processing with clustering methods, you can simulate a personalized optimal diet for a participant based on medical indicators in terms of minimizing the risk function.As can be seen in Table 4 NAT2 and APOA5 genes make the greatest contribution to the risks of hereditary diseases for people assigned to cluster 3. Therefore, the cluster 3 consumer group nutrition ration must necessarily take into account the corrected nutritional requirements associated with these genes.The NAT2 gene is responsible for the detoxification of xenobiotics.It reduces the enzymatic activity of a number of enzymes and provokes colon and bladder cancer.In this regard, the diet of participants in cluster No. 3 should additionally contain food enriched with natural and engineered antioxidants.Since this gene also plays an important role in the detoxification of pesticides and in carcinogenesis processes, people with a high risk for this gene should prefer organic food and be attentive to products that can accumulate pesticides and heavy metals.
The APOA5 gene regulates the level of α-tocopherol (vitamin E).For people with an unfavorable genotype for this gene, it is necessary to increase the intake of vitamin E by eating more foods with a high content of it.
In cluster 2, the most provocative genes are APOA5 and SR (HTR2A).The SR gene (HTR2A) encodes the synthesis of serotonin, affecting the psychological stability of the consumer.It is possible to increase the level of serotonin by enriching the diet with offal, group B vitamins, Ca and Mg macronutrients.
In cluster 1 genes GSTM1, NBPF3 and APOA5 make the greatest contribution to the risks of hereditary diseases.Cluster number 1 participant is recommended to eat foods high in vitamin E, wholemeal bread, bran and nuts.

V. CONCLUSION
On an example of the genome analysis of the considered consumer group, a methodology was developed for the formation of target diets based on multidimensional objects clustering method.Using Data Mining (clustering method) allows to construct a balanced daily ration for personalized nutrition.Based on the study, data collection, compilation and processing of numerical information based on medical indicators, it reduces the number of rations being developed from 10 to 3.
On the base of genetic data of experimental group participants included in one or another cluster, the development of the diet of the target group should take into account adjusted physiological needs for food nutrients associated with the presence of gene polymorphism of these participants.
Lung cancer, acute leukemia, general oncology, proton pump inhibiting CYP2D6*3,*4 Metabolism of psychotropic drugs, including drugs of a narcotic series CYP2C9*2, *3 Metabolism of antidepressants, β-adrenoreceptor blockers CYP2C19*2 Metabolism of some pharmaceuticals, including proton pump inhibitors Phase 2 -Detoxification GSTT1 Bowel Cancer.Encode the synthesis of the enzyme glutathione-S-transferase. Activate glutathione GSTM1 Bowel Cancer.Encode the synthesis of glutathione-S-transferase. Activate glutathione NAT2 Encodes the enzymes responsible for the catalysis of aromatic xenobiotics by acetylation.Determines the rate of occurrence of a malignant neoplasm of the walls of the bladder and rectum TPMT Responsible for the synthesis of the enzyme thiopurine-S-methyltransferase, which is associated with the processes of detoxification of the body.Vitamin metabolism NBPF3(ALPL) The risk of reducing the concentration of vitamin B6 FUT2 The risk of reducing the absorption of vitamin B12 BCMO1 Risk of disorders in vitamin A synthesis from β-carotene APOA5 Risk of low levels of α-tocopherol (vitamin E) Psycho-emotional status DRD-2A

TABLE II .
ESTIMATION OF RELIABLE RISKS OF THE PROBABILITY OF DEVELOPING DECEASE CONDITIONS BY SELECTED METABOLIC PROCESSES, EXPRESSED IN POINTS (HIGH PROBABILITY-2 POINTS, MEDIUM -1 POINT, LOW -0 POINTS)

TABLE IV .
THE RESULT OF COMBINING OBJECTS (PARTICIPANTS) INTO CLUSTERS