Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2013.040503
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 4 Issue 5, 2013.
Abstract: Pollutant forecasting is an important problem in the environmental sciences. Data mining is an approach to discover knowledge from large data. This paper tries to use data mining methods to forecast ?PM?_(2.5) concentration level, which is an important air pollutant. There are several tree-based classification algorithms available in data mining, such as CART, C4.5, Random Forest (RF) and C5.0. RF and C5.0 are popular ensemble methods, which are, RF builds on CART with Bagging and C5.0 builds on C4.5 with Boosting, respectively. This paper builds ?PM?_(2.5) concentration level predictive models based on RF and C5.0 by using R packages. The data set includes 2000-2011 period data in a new town of Hong Kong. The ?PM?_(2.5) concentration is divided into 2 levels, the critical points is 25µg/m^3 (24 hours mean). According to 100 times 10-fold cross validation, the best testing accuracy is from RF model, which is around 0.845~0.854.
Yin Zhao and Yahya Abu Hasan, “Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms” International Journal of Advanced Computer Science and Applications(IJACSA), 4(5), 2013. http://dx.doi.org/10.14569/IJACSA.2013.040503