Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2014.050314
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 5 Issue 3, 2014.
Abstract: A new machine learning tool has been developed to classify water stations with similar water quality trends. The tool is based on the statistical method, Weighted Regressions in Time, Discharge, and Season (WRTDS), developed by the United States Geological Survey (USGS) to estimate daily concentrations of water constituents in rivers and streams based on continuous daily discharge data and discrete water quality samples collected at the same or nearby locations. WRTDS is based on parametric survival regressions using a jack-knife cross validation procedure that generates unbiased estimates of the prediction errors. One of the disadvantages of WRTDS is that it needs a large number of samples (n > 200) collected during at least two decades. In this article, the tool is used to evaluate the use of Boosted Regression Trees (BRT) as an alternative to the parametric survival regressions for water quality stations with a small number of samples. We describe the development of the machine learning tool as well as an evaluation comparison of the two methods, WRTDS and BRT. The purpose of the tool is to evaluate the reduction in variability of the estimates by clustering data from nearby stations with similar concentration and discharge characteristics. The results indicate that, using clustering, the predicted concentrations using BRT are in general higher than the observed concentrations. In addition, it appears that BRT generates higher sum of square residuals than the parametric survival regressions.
Alexander Maestre, Eman El-Sheikh, Derek Williamson and Amelia Ward, “A Machine Learning Tool for Weighted Regressions in Time, Discharge, and Season” International Journal of Advanced Computer Science and Applications(IJACSA), 5(3), 2014. http://dx.doi.org/10.14569/IJACSA.2014.050314