Grey Clustering Method for Water Quality Assessment to Determine the Impact of Mining Company, Peru

Mining operations have a significant impact on environment, where the quality of water is an important affected issue that need to be controlled. In that way, the Grey Clustering Method based on center-point triangular whitenization weight (CTWF), is an artificial intelligence criterion that evaluates water samples according to selected parameters, in order to realize an effective water quality assessment. In the present study, the analysis is made on the Crisnejas River Basin, by using fifteen monitoring points based on an investigation realized by the National Water Authority (ANA) in 2019, based on the Peruvian law (ECA) about water quality standards. The results reveal that almost all of the monitoring points on the Crisnejas River Basin were classified as “irrigation of vegetables unrestricted”, but only one point was classified as “animal drink”, which is ubicated in an urbanized area. This implies that mining discharges are being well treated by the company, but another deal is the contamination generated in towns. Further, the present study might be helpful to audit processes made by the state or companies, to justify the quality of surface waters using a more accurate methodology. Keywords—Grey clustering method; mining company; water quality; artificial intelligence


I. INTRODUCTION
Yanacocha is a well-known mining district in Cajamarca, Perú, characterized by an extensive disseminated gold mineralization [1]. Therefore, the operations and processes used to obtain this metal have an important environmental impact over superficial waters like rivers, which are an important source for people, livestock and plantations [2]. Consequently, there is a necessity to analyze the water quality to assure that concentrations of diverse contaminants are in the range of permitted limits.
For this reason, the need of using a grey clustering evaluation method is proposed, as used in other superficial water quality analysis for rivers [3] [4]. The method is based on grey systems theory, which overcomes the problem about the lack of information of evaluated objects [5]. Also, there is an importance in determining the whitenization weight function; and in that way, the center-point triangular weight function (CTWF) is applied [6].
Yanacocha is located in Cajamarca at 3600 m.a.s.l., in the northern part of Perú. In addition, is at the top of the Crisnejas River Basin [7], which delivers a watershed system downstream the rivers that cross the city of Cajamarca and communities that live nearby dedicated to livestock [8]. So, the transportation agents are rivers, that depending of their shape, flow rate, width, drainage area and topography, reduce or concentrate a varied record of contaminants.
For the study, we used 15 monitoring points taken from field by ANA [7], and Peruvian law [9]. In that way, the specific objective was to redefine a more accurate classification based on grey clustering, applied to the Crisnejas River Basin, were the results obtained using this methodology determine the grade of negative impact of upstream mining. Furthermore, this study can help people that lives along the watershed to have more control over water management.
The structure used for the study is developed as follow: literature review, methodology of the grey clustering system, case of study, results -discussion and conclusions.

II. LITERATURE REVIEW
Grey Systems focuses on the effective processing of available data, in order to deal with the uncertainty, they present. In this way it is necessary to lay a firm methodological foundation for the scientific paper since water quality monitoring and assessment is highly influenced by uncertainty [10]. Other paper about the quality of the Rio Cau provided a way to characterize a river in order to analyze the decisionmaking process. Furthermore, the Crisnejas River Basin assessment can serve as a model for other mining projects in terms of water resource management [11].
In "The Use of Grey Systems Theory to Analyze the Water Supply Systems Safety", the use of the grey clustering method based on the grey systems theory is made to evaluate water quality with artificial intelligence criteria. This is due to the fact that in other papers, the limitations of data collection for water supply companies in order to extend the analysis of the matrix, risk in water safety plans [12]. The evaluation uses the monitoring offered published by the National Water Authority (ANA) and the parameters established by MINAM-Peru (DS N° 015-2015), which will be taken into account for this work. The study will focus on the evaluation of the quality of the Santa River and how it can be used for the consumption of the population through different types of water treatment. On the contrary, our study will focus on the management of the water resources of the Yanacocha mine based on the evaluation of 557 | P a g e www.ijacsa.thesai.org the water quality of the Crisnejas River Basin with the purpose of being a model for the current and future mining projects nearby [5].
"Water Quality Assessment using the Grey Clustering Analysis on a river of Taxco, Mexico" is a research which evaluates the impact of wastewater from a mine on a river in Taxco (Mexico) and how it has been impacted by the mining activity. For this reason, the grey clustering classification method was used to evaluate water samples in 4 different points. Also, this work shows how the proximity to the mining operation impact in the degree of contamination. Another study which is similar as the latter, is "Water Quality in Areas Surrounding Mining: Las Bambas, Peru" which used the Grey Clustering Method in order to evaluate the impact in the area that includes the Challhuahuacho and Ferrobamba rivers where the las Bambas mine operate [13]. This information is important to stablish the relationship between mining and water pollution, which is decisive because in the present study case the Crisnejas River is close to mining operations [14].
In the article "Water Quality Assessment of the Mining-Impacted Elqui River Basin, Chile" [15], the water quality assessment is done by using multivariate data analysis to characterize the main impacts (mining, agriculture and hydrothermal pollution) on Elqui River in Chile. Also, the use of factorial indices like mining pollution and salinization help to highlight and identify the sections of the river that were more influenced. For the study, this will be useful to consider some statistical methods like principal component analysis, determine the threats for the Crisnejas River Basin.
According to the paper "Hydrochemical evaluation of the influences of mining activities on river water chemistry in central northern Mongolia" [16], different concentrations of studied parameters give information about the setting of potential environmental activities, like mining and erosion processes. In the present work, a similar situation is studied by analyzing a river basin that is connected at the top of the watershed with a mine, and downhill with agriculture and livestock zones.
Additionally, the article "Finding water quality trend patterns using time series clustering: A case study" [17], explains the use of time series clustering to find time quality trend patterns in Zhejiang Province, China. In that way, there can be analyzed geographically distant regions, which may present similar patterns according to certain physicochemical parameters. As a result, finding root causes of water pollution, by anthropogenic factors would be more identifiable.

III. METHODOLOGY
Grey System theory is a methodology used in studies with small samples or lack of information. The grey clustering evaluation method based on CCTWF is used in evaluation of water qualities as used in previous studies [18].

A. First Step: Setting of Central Points
The central points are calculated by using a standard rule for water as delimitation points. Consequently, the need to convert the ranges into three Grey Classes (λ1, λ2 and λ3) used in Peruvian regulation, makes important to calculate central points as averages and limits.

C. Third Step: Set the Grey Functions or Triangular Functions
The functions will be defined under the parameters established in previous step using Grey System functions (as shown in Fig. 1). According to the categories assigned to each function, those established in the Peruvian DS were applied, which are the following: y = f j 1 = A1 = water for unrestricted vegetable irrigation y = f j 2 = A2 = water for restricted vegetable irrigation y = f j 3 = A3 = water for animal drinking Under these conditions the functions are as shown in (2) -(4).

D. Fourth Step: Determination of the Weight for each Criterion
In this step the clustering weight of the grey class parameters will be determined using the harmonic mean method expressed in (5).
Where ( ) the value from the CTWF and is the weigth for each parameter.

F. Sixth Step: Determination of the Max Coefficient
Finally, to determine the category for each monitoring point using the maximum value of coefficient, (7) will be applied.

IV. CASE STUDY
The analysis of superficial water quality was carried out in the Cuenca Crisnejas -Subcuenca Cajamarquino -ALA Cajamarca, located at the northern part of Perú [7]. Therefore, the closeness and influence of the watershed with Minera Yanacocha Project is important to be analysed. Furthermore, three principal rivers were considered for the study: river Mashcon, river Chonta and river Cajamarquino.

A. Definition of Objects Study
The study conducted by the "ALA Cajamarca" and the "Área Técnica de la Autoridad Administrativa del Agua V1 Marañón", considered thirty monitoring points along the watershed. However, for the study we took only fifteen monitoring points (as shown in Table I), as they were considered to be strategically located (see Fig. 2).

B. Definition of Evaluation Criteria
The evaluation criteria used for the present study is determined by water quality parameters according to the the study made by "ALA Cajamarca" and the "Área Técnica de la Autoridad Administrativa del Agua V1 Marañón" (shown in Table II). In addition, the criterions selected are chemically correlated to the type of deposit of Minera Yanacocha, which is a High Sulphidation Epithermal gold deposit.

C. Definition of Grey Classes
The definition of Grey Classes was based on the criterions of ECA 2017 (Table III) [9]. The analysis was made for Category 3: Watering of vegetables and drink for animals, because Minera Yanacocha treats its waters for agriculture and livestock. In this study it is taken into account that "not restricted" irrigation values are more rigorous than the "restricted" irrigation water, and even more than "drink for animals". Therefore, each category corresponds to λ1, λ2 and λ3 as it corresponds from the highest, towards the lowest water quality.  (Table IV). For example, take the higher and lower value to calculate the average as a medium index.
2) Second step: Non -dimensional conversion: • Standard Values: For each criterion, and considering the three Grey Classes, it is calculated an average. Then, the new nondimensional standard value is the result of dividing the original value by the average (Table V).
• Monitoring Points: Using each average calculated above, the new non-dimensional monitoring point value, is the result of dividing the original value by the average (Table VI).

3) Third step:
The values (λ1, λ2, λ3) of each type of variable (pH, OD and more) are substituted in (8) -(10) in order to obtain the functions that will be used to evaluate all the monitoring points. It is presented the functions of the dissolved oxygen parameter (OD) and in the same way the other parameters will be developed.   The following tables present the evaluation of the parameters in the monitoring points established at the beginning. In Table VII, the non-dimensional values are shown.

4) Fourth step:
As already mentioned in the methodology, below (Table VIII) are the weights for each parameter using the harmonic mean method through (5).

5) Fifth step:
The clustering coefficient was calculated using (6). Table IX shows the results for each parameter.

6) Sixth
Step: Finally, the highest value of the clustering coefficients for each point is chosen, then it determines the grey class which it belongs (7). For each point, the results are shown in Table X.

A. About the Case Study
In this report, grey clustering was used to classify the different monitoring points taken by ANA during 5 to 10 April, 2019. They were classified according to Peruvian legislation in the ECA -Category 3, "Irrigation of vegetables and animal drink" which is divided from more towards less rigorous: A1, A2 and A3. Table X indicates the classification of each monitoring according to the category assigned. It is observed that 14 points belong to the category of water for "unrestricted irrigation", which allows its use for irrigation of food crops that can be in direct contact with water and that can be consumed raw. On the other hand, one monitoring point (G5) belongs to "animal drinking" waters used for drinking by large animals such as cattle, horses or camelids, and for smaller animals such as pigs, sheep, goats, guinea pigs, birds and rabbits. Fig. 3 shows that most of the monitoring points, even the ones that are close to the mines, present good water quality with few physicochemical variations, pH, etc. However, further downstream the point G5 is a monitoring point from Mashcon River located near to an urbanized area, and has low water quality because of the impact of direct sewage discharges. Besides, it can be verified by observing its high COD value [19].

B. About the Methodology
By way of comparison, the method used considers the uncertainty in its analysis unlike other methods that do not include it in their development. For example: Delphi or the Analytic Hierarchy Process (AHP). This is considered important in the study topic since the degrees of evaluation develop accuracy and decrease the uncertainty.
One of the advantages of using grey clustering is the simplicity of mathematical modeling, as beyond the results are more simple to understand the entire process (using the CTWF) [20]. In addition, the application of weights by means of the harmonic mean method makes the study more straightforward and uncomplicated. Additionally, the method is very useful, since it allows to classify and weight the data of the sample.
On the other hand, the development of the principal issue is subjected to the legislation of the country (which may or may not be well defined), in which was determined to carry out the study. And that is why the variability presented by this method is disadvantageous if it is to be compared with other studies. Finally, the subject of study develops real problems that in its absence are always altered or are very changeable within the environment. That is where decisions will have to be made about negative anomalies detected under a dynamic process over time [21]. However, Grey Clustering is an effective methodology for environmental impact studies on surface water qualities [22] [23].

VI. CONCLUSION
From the results obtained in the present study, it can be determined that in general there is good water quality in all areas, except in one point. This monitoring point is located in the urbanized area and is polluted due to sewage. On the other hand, although monitoring points related to the Yanacocha mine were taken into account, the results show that they are qualified waters that can be used for food consumption that is consumed raw.
With regard to the method used for this research, the Grey Clustering Method is one of the most effective options for classifying water monitoring points based on the information provided by the National Water Authority (ANA) and the parameters (ECA) established in Peruvian legislation. Unfortunately, the information only provides with values with no well stablished parameters and limits. Even though the samples taken lack of certainty, the method considers that flaw and mitigate it. On the other hand, the mathematical advantage of carrying out the study was seen, due to the fact that it is simple to apply triangular functions and to use the harmonic mean to establish the weights of the parameters. However, it is not efficient to use the Excel software to carry out such calculations required by this method, that is why a programming software should be used or without going so far as to use the Visual Basic that comes included in Excel.
Although the results indicate that water resource management is being carried out efficiently, with this study only random points belonging to the Crisnejas River Basin have been monitored. For this reason, it is proposed to carry out other investigations taking into account the discharge points of the Yanacocha mine in order to verify that the maximum permissible discharge limits are being accomplished. This is suggested because there is a possibility of dilution of pollutants by other effluents that when they converge with effluents directly affected by the Yanacocha mine, do not really reflect the problematic of water quality.
In addition, other studies of this area could be carried out using other methods similar to grey clustering in order to obtain another perspective and corroborate the results.