Applying Grey Clustering and Shannon’s Entropy to Assess Sediment Quality from a Watershed

The evaluation of the quality of sediments is a complex issue in the Peruvian reality, mainly because there is no sampling protocol or norm for comparison, which leads to the assessment of sediments without a comprehensive analysis of their quality. In the present study, the quality of the sediments in the upper basin of the Huarmey river was evaluated in 30 monitoring points and 7 parameters, which are: arsenic, cadmium, copper, chromium, mercury, lead and zinc, which were compared according to the standards recommended by the Environmental Quality Guidelines for Sediments in freshwater bodies of Canada (Canadian Environmental Quality Guidelines CEQG, 2002. Sediment Quality Guidelines for Protection of Aquatic Life Fresh water according to Canadian Council of Ministers of the Environment (CCME)). The results of the evaluation, by grey clustering method and Shannon entropy, showed that 13 monitoring points resulted in good sediment quality, 1 monitoring point had moderate quality and 16 monitoring points presented poor quality; therefore, it can be concluded that the effluents and discharges of the mining activities that take place in the aforementioned location have a negative impact on environmental quality. Finally, the results obtained can be of great help for OEFA, the regional government, the municipalities and any other body that has oversight functions, since they will allow them to be more objective and precise decisions. Keywords—Grey clustering; sediment quality; Shannon entropy


I. INTRODUCTION
The water resource of the Huarmey river basin in Ancash, Peru, represents the vital element for the supply of population, agricultural, livestock, mining, energy and ecological use [1], being important its optimal, rational and sustainable use. However, due to the continuous complaints from the surrounding population expressing their discomfort over an alleged environmental impact on the water and sediment of the upper basin of the Huarmey river, an environmental monitoring was carried out, where there is a record of the existence of mining activity in exploration and operation stage (Minera Huinac SAC) [2].
In the upper basin of the Huarmey river, districts of La Merced, Aija, Huacllan and Succha, province of Aija, department of Ancash, Peru. The sediment quality assessment was carried out, for which there were 30 monitoring points carried out by the Organization for Environmental Assessment and Enforcement, a specialized technical organization in charge of compliance with Peruvian environmental regulations [2]. It is important to mention that, at present, Peru does not have a sediment sampling protocol and nor does it have regulations or quality standards to evaluate this component. Therefore, standards recommended by the Environmental Quality Guidelines for Sediments in freshwater bodies of Canada (Canadian Environmental Quality Guidelines -CEQG, 2002) will be used. Sediment Quality Guidelines for Protection of Aquatic Life -Fresh water according to the Canadian Council of Ministers of the Environment [3].
For the evaluation of the sediment quality, we will use the Grey Clustering method, as well as the Shanon entropy. Grey Clustering is a methodology that is based on a theory of fuzzy sets and can be applied by grey incidence of matrices or whitenization functions because unlike traditional statistical methods [4], this methodology considers the uncertainty of the fuzzy type which is present in the environment within their analysis. For the case study, the "Center-point Triangular Whitenization Functions -CTWF" will be used, since the CTWF is mainly applied to test if the observation objects belong to predetermined classes, known as grey classes [5] as evidenced in the studies of selection of innovative strategies [6] and in the evaluation of the quality of sediments by grey incidence [7]. On the other hand, the Shannon entropy method is also an artificial intelligence approach developed by Claude E. Shannon (Shannon and Weaver, 1994) that addresses the uncertainty due to the dispersion of the data [8], therefore this method was also used to determine the weights of the evaluation criteria within the CTWF method [9]. Therefore, the specific objective of the present work is to analyze and value the quality of sediments by Grey method Clustering and Shannon entropy in the upper basin of the Huarmey river monitored in May 2016 based on the standards recommended by the Guidelines Environmental Quality for Sediments in freshwater bodies of Canada (Canadian Environmental Quality Guidelines -CEQG, 2002. Sediment Quality Guidelines for Protection of Aquatic Life -Fresh water according to Canadian Council of Ministers of the Environment [3]. Thus, on the following the study is formed by Section II which summarizes the literature review; Section III, in which the CTWF method is explained in detail. After Section IV, will be the section where the case study is described, then the results and their discussion are presented in Section V. Finally, the conclusions are presented in Section VI. www.ijacsa.thesai.org II. LITERATURE REVIEW Delgado et al., in 2017, developed a research that studied the water quality of the Santa River, in which different points were analyzed according to the parameters established by MINAM-Peru (DS No 015-2015). In this sense, 21 monitoring points of the Santa River basin were analyzed. It was concluded that the grey clustering method showed interesting results and that it could be applied to other studies on water quality or the environment in general. In this regard, the results showed that 47.6% of the monitoring points presented a good quality of water for human consumption, which could be purified by applying a disinfection; 33.3% of the monitoring points presented a moderate quality of water for human consumption, which could be purified with a conventional treatment; and 19.1% of the monitoring points presented a low quality of water for human consumption, which could be purified by applying a special treatment [10].
Delgado, in 2020, in Peru, mentions that evaluating the quality of surface waters is a complex issue that involves the comprehensive analysis of several parameters that are altered by natural or anthropogenic causes. In this sense, the Grey Clustering method, which is based on the Grey Systems theory [11], and the Shannon Entropy, based on the artificial intelligence approach, provide an alternative to evaluate water quality in a comprehensive manner considering the uncertainty within the analysis. In the mentioned study, the water quality in the upper basin of the Huallaga River was evaluated taking into account the results of the monitoring of twenty-one points carried out by the National Water Authority analyzing nine parameters of the Prati index. The results showed that all the monitoring points of the Huallaga River were classified as uncontaminated, which means that the discharges, generated by economic activities, are carried out through treatment plants that meet the quality parameters [12].
Environmental Assessment and Control Agency (OEFA by its Spanish acronym), in Peru, 2016 water and sediment quality monitoring was carried out from May 20 to 30 at 30 monitoring points belonging to the upper basin of the Huarmey River, which is formed by the Llactún and La Merced rivers with their respective tributaries, that when joined form the Santiago River which receives the contribution of the stream of the same name, in which there is a record of the existence of mining activity in exploration and operation stage (Minera Huinac S. A.C.). Finally, downstream it takes the name of the Aija River, which receives the contributions of the Mallqui and Allma rivers, where concentrations of total arsenic were exceeded in the 30 monitoring points, copper in 26 points, mercury in 10 points, lead in 18 points, zinc in 25 points of the reference values of the Canadian standard [2].
Chu and Tan, in 2014, in China, carried out the analysis of 39 samples of surface sediments, from the coastal ocean of Jiangsu to evaluate their quality. Making use of the Grey Clustering method for its evaluation and generating results classified into three categories (clean, light pollution and intense pollution). Of the thirty-nine samples, there are eleven clean samples, twenty light pollution samples, and eight heavy pollution samples. When analyzing the underlying reasons, pollutants dumped into the sea due to increased industrial and agricultural activities that contributed to the pollution. Therefore, more emphasis should be placed on the management of the surface tidal flat sediment environment, especially on the treatment of the pollution source to improve the sediment quality for the sustainable development of the coastal zone [7].
Delgado, in 2018, in Peru, states that the assessment of pollution and the quality of the air is a serious problem for big cities, considering the increasing pollution of the air. In this sense, the evaluation of this problem using the grey clustering method, which is based on the theory of the grey system, has great advantages since it considers this uncertainty within the analysis. In such study, an evaluation of air quality was carried out in three monitoring points located in three different districts of the city of Lima, Peru, which are San Martín de Porres, Carabayllo and Puente Piedra. The results revealed that the three monitoring points presented good air quality in accordance with Peruvian law. Nevertheless, this could be because the districts in which the monitoring points are located are relatively new. Finally, the results of this study could help local and central authorities to make the best decision on the evaluation of air quality [5].
National Water Authority (ANA by its Spanish acronym), in November 2015, evaluated the quality of the surface water of the Huarmey river basin, concluding that the effects on the bodies of water belonging to and / or tributaries to the Huarmey basin were located in the upper area of said basin, being the Montecristo, Huinac, Hercules and Santiago streams, as well as the Llactún river, which presented concentration values of the content of metals such as aluminum, arsenic, cadmium, copper, iron, manganese, lead and zinc, which exceed the value established in the ECA-Water, due to the fact that mining companies are installed in this area at the head of the basin [13].
The application of the Grey Clustering and Shannon Entropy methodology in sediment quality is an innovative method; this is due to the scarce existing bibliography that applies similar methodologies and the non-existence of sediment quality standards in Peru. That is why the importance of the present work lies in the application of these powerful methodologies in a new field of study, the Peruvian context, in addition, it allows perceiving the quality in an environmental component of the upper basin of the Huarmey River, Peru.

III. METHODOLOGY
In this section, we will proceed to describe the center-point triangular whitenization weight functions (CTWF) method, which can be described as follows: First assume that there is a set of objects, a set of criteria and a set of grey classes, according to the sample value Then the CTWF method is appreciated in a flowchart in Fig. 1 and are developed with the following steps [12], [14]- [16].

A. Step 1: Determining the Center Points
The criteria ranges are divided into three grey classes, which are: and these values are determined using the Canadian sediment standard. www.ijacsa.thesai.org

B. Step 2: Dimension Removal
At this point it is assumed that there are objects for evaluation and n criteria or grey classes, which form the following matrix { }. In this sense it is normalized by each criterion . The normalized value , which is calculated by (1). (1)

C. Step 3: Determination of Triangular Functions
The grey classes are expanded in the directions of each parameter used and for this purpose the Canadian standard for sediments will be used as a reference, which provides values to measure the quality. In this research the Canadian standard provides us with three quality levels for each parameter analyzed, so we will have three functions for each parameter analyzed. The new sequence of center points in . For class , for an observed value . The computation of the Central Point Triangular Whitenization Functions (CTWF) is shown by (2) -(4). A visual representation is shown in Fig. 2.

D. Step 4: Determining Weight for each Criterion
In this step, use is made of Shannon's Entropy weight method, which developed the measure "H", which satisfies the following properties [9], [12], [14], [15], [17]:  H is a positive continuous function.
 If all are equivalent , in that sense, H should be a growing monotonous function of .

 For all
Shannon shows that only functions satisfying this condition are computed by (5). Where: Regarding the entropy weight methodology, it can be demonstrated according to the following definition [9], [12], [15]. As shown above, objects are shown for evaluation and evaluation criteria, which form the following matrix { } . After that, the following steps will continue.

1) The matrix
{ } is normalized by each criterion . The normalization evaluates and are calculated by (6).
2) The entropy of each criterion is calculated by (7).
Where, is constant, 3) The degree of divergence of intrinsic information of each criterion is calculated by (8).  682 | P a g e www.ijacsa.thesai.org 4) The weight of the entropy of each criterion is calculated by (9). (9) Where, it's equal to the E.
Step 5: Determining the Clustering Coefficient The clustering coefficient per object with respect to the grey classes is calculated by (10). (10) Where is the CTWF of the grey class of the criterion, and is the weight of criterion , to establish these the Shannon Entropy method will be used.

F. Step 6: Results using the Maximum Clustering Coefficient
Finally, the value of { } has to be calculated, based on that the object belongs to each grey class is opted. When there are several objects in some grey class, these objects can be ordered according to the magnitudes of their clustering coefficients.

A. Description of the Study Area
The study area is in the upper basin of the Huarmey River, district of La Merced, Aija, Huacllan and Succha, province of Aija, department of Ancash, Peru as shown in Fig. 3. This is due to the results of water and sediment quality information, carried out from May 20 to 30, 2016. For which there were 30 monitoring points conducted by the Organization for Environmental Evaluation and Enforcement [2].

B. Description of Study Objects
For the evaluation of sediment quality in the upper basin of the Huarmey River La Merced, Aija, Huacllan and Succha districts, province of Aija, department of Ancash, information was collected from 30 monitoring points obtained from Report No. 266-2016 OEFA/DE-SDCA, carried out May 20 to 30, 2016 [2], and are shown in Table I and Fig. 4.

C. Description of Evaluation Criteria
The evaluation criteria for the present study are determined by the Canadian sediment quality parameters, which are presented in Table II:   TABLE II. SEDIMENT QUALITY ASSESSMENT CRITERIA

D. Definition of Grey Classes
The Grey Classes for the evaluation are 3 and are based on the sediment quality levels of the international standard -Canadian Sediment Quality Guidelines (CSQG) according to the Canadian Council of Ministers of the Environment [3] where it contemplates the limits to the ISQG (Interim Sediment Quality Guideline) values: the concentration below which no adverse biological effects are expected and PEL (Probable Effect Level): concentration above which adverse biological effects are frequently found. Proposing a third category considered as Moderate, which is presented in Table III.

E. Calculations using the CTWF Method
Step 1: Center points Based on the international standard CSQG, the central values of the parameters to be analyzed were obtained and denominated as equivalent ISQG: Good and PEL: Poor. These values are shown in Table IV. Step 2: Dimension removal The non-dimension values for each parameter according to Table III on the average concentration of each parameter are shown in Table V.  Step 3: Determining triangular functions and their functions Replacing the values obtained from Table VI in the equations of the Whitenization functions, as an example for total arsenic functions equations are shown in (11) - (13). Its graphic representation is displayed in Fig. 5.
Step 4: Determining weight for each criterion The clustering weight ( ) of each parameter was determined with Shanon's entropy. The following procedure was followed for this purpose.

1)
Standardized parameter values from the Canadian standard, which are presented in Table VII.
2) The entropy ( ) of each criterion ( ) was calculated through (7). The results are shown in Table VIII.
3) Finally, the entropy weights wj were found according by using (9), and equated to the clustering weights of each parameter. The values are presented in Table IX. Step 5: Determining the clustering coefficient The values of the clustering coefficients were calculated using (10). The results of the first 2 monitoring points are shown in Table X. Step 6: Results using the maximum clustering coefficient Finally, we calculate the value of { } for each grey class according to each monitoring point by adding a comparison by quality scale and with it we get Table XI.

A. About the Case Study
It is observed in Table XII that 13 (43%) monitoring points resulted with the sediment quality good, while 1 monitoring point (3%) resulted with moderate quality and 16 (54%) monitoring points with poor quality. In addition, the comparison of quality level can be performed according to the maximum clustering coefficient (Max ). In addition, we analyze according to each quality category into good, moderate and bad, respectively. For a better understanding further details regarding their location and differentiation are displayed in colors in Fig. 6.
 Good category: This means that no adverse biological effects are expected in these points and it is also observed that monitoring point P11 has the best water quality and point P5 the lowest quality within this category, this may happen because the points are located in tributaries to the main river and also the monitoring point P11 is more distant from the bridle path compared to the monitoring point P5 [2]. Likewise, the other points show good quality, possibly because they were sampled in tributaries of the main river.  Bad category: It means that biological effects are frequently found in these points and also the monitoring point P25 presents the best sediment quality while point P2 presents the lowest sediment quality within this category, this could happen because the points are located within the main river where the mines discharge their effluents and also points P2, P3 and P4 are located downstream and closer to the Huinac mine, which is why they may have the lowest sediment quality compared to point P25, which has the best quality in the category since it is located 100 m downstream of the MTZ S.A.C. mine [2], but compared to points P2, P3 and P4, this mine has better control of its effluents. www.ijacsa.thesai.org In relation to other studies, Chu & Tan [7] applied the Grey Clustering method to evaluate the surface sediment quality of the Jiangsu coastal ocean, and showed that more emphasis should be placed on the management of the surface tidal flat sediment environment, especially on the treatment of the pollution source to improve the sediment quality for the sustainable development of the coastal zone. In the water quality assessment conducted by Delgado et al., [12] in the Huallaga River basin, it was shown that of the 21 monitoring points analyzed all of these were found -according to the Pratti index -to be uncontaminated. A point to highlight from the previous study is that Shannon Entropy was used to determine the value of the weights, a feature that was not used in the study by Liping et al study [18] where the weights were determined using the arithmetic mean. Therefore, the present study is characterized by analyzing a topic that is little addressed from the perspectives of the Grey Clustering method, which is the analysis of sediments, but it also integrates a component that will provide greater objectivity and precision, Shannon Entropy, to determine the value of the weights.

B. Proposals for Poor Quality Points
According to the results and discussion on the case study, it is possible to evidence the contamination of sediment quality by heavy metals and that can be attributable to the effluents of the mining companies (mainly Huinac), being the monitoring points downstream of these, therefore, it is proposed that this incorporates constant monitoring of sediment quality and thus also take into account the natural state of the streams and rivers that may be mineralized areas, and if responsibility is warranted by this to proceed to better control their effluents by treatment plants.

C. About the Methodology
The Grey Clustering method is a useful methodology to analyze the environmental system with respect to environmental quality, since among the advantages of Fuzzy Sets it is considered that it can be applied when the internal mechanisms of the system are unknown or the concept to be measured is imprecise, in this case of the study area. It is a mathematical theory of uncertainty to model situations where traditional instruments do not lead to optimal results due to the existence of uncertainty problems [19]- [21].
Based on the study conducted, the Grey Clustering Analysis Methodology (GCA) was chosen as the method for Environmental Quality Assessment over the Delphi methods [22] and the AHP method [23], where the advantages and disadvantages involving these mentioned methodologies are shown in Table XII. The Shannon entropy method allows determining the clustering weights for each criterion in an objective way, without the need to consult experts, which makes this method a more efficient and integrated method for sediment quality assessment. www.ijacsa.thesai.org -High degree of flexibility, due to the membership functions, which allows it to include uncertainty in its analysis. its analysis.
-It presents an objective weighting system.
-It allows an integral analysis between the different thematic areas, by means of fuzzy inference rules.
-Process a large set of data and reduce its dimensionality with a minimum loss of information [24].
The methodology has not been disseminated and its applications to environmental fields are recent.
Delphi Methodology -Subjective experience and critical input.
-Complex, large, multidisciplinary problems with considerable uncertainties.
-Possibility of unexpected breakthroughs.
-Particularly long-time frames.
-Achieving consensus in areas of uncertainty or in situations lacking causality.
-Focus on issues where multiple stakeholder groups are potentially involved.
-Delphi studies can be relatively simple to design and flexible in the way those designs are combined [25].
-Causal models cannot be built or validated.
-Opinions from a large group are required.
-Deficiencies of the researcher or panel members may arise.
-The researcher imposes his preconceived ideas on respondents.

Analytic Hierarchy Process (AHP) Methodology
-Allows a complex problem to be broken down and analyzed in parts.
-Allows quantitative and qualitative criteria to be measured using a common scale.
-It facilitates the understanding of the problem by the decision-maker or by those involved in the analysis stage.
-It is easy to use and allows its solution to be complemented with mathematical optimization methods [26].
-Additional analysis is required to establish preconditions. Since minimum points of agreement among stakeholders regarding objectives, criteria, weights, etc. are required.
-The analysis includes a certain degree of subjectivity. Because data from different sources are used, the analysis includes some degree of subjectivity.

VI. CONCLUSION
In the present quality study of the sediments of the surface water bodies of the upper basin of the Huarmey river, the 30 monitoring points of the basin could be classified, from which it was determined that 13 (43%) of the monitoring points presented good sediment quality, 1 (3%) moderate and 16 (54%) of poor quality. In the points that presented a good quality of sediments, it is attributed to the fact that they were located in tributaries of the main river, for the monitoring point that presented a moderate quality it can be attributed to being located at a significant distance from the MTZ SAC mine. Those that presented a poor quality of sediments, which corresponds to most of the points sampled, can be attributed to the fact that they are located within the main river, close to the activities of the mining industries (mainly Huinac) and the discharges of their effluents.
The study used the Grey Clustering method and Shanon's Entropy, as for the Grey Clustering methodology it turns out to be one of the most effective since it considers the uncertainty within the analysis; in addition, the analysis was enhanced with the Entropy methodology of Shannon, which allows developing the analysis process objectively, without the need for expert judgment. As a result of these, it allowed to generate classifications of the quality of sediments, which are pertinent in the application in the Peruvian context due to the lack of regulations regarding sediments.
This evaluation information obtained is relevant because it allows generating timely decision-making, in relation to the current context, by public entities and the central government with powers in the upper basin of the Huarmey River, Peru. And finally, the study serves as the basis for future research regarding the quality of sediments, in addition, to be complemented with subsequent studies of characterization of water, soil and sediments by natural formations in the Huarmey river basin.