The study area defined by the coordinates (90°E − 92°E, 23°N − 25°N) is a significant region in Bangladesh, where accurate rainfall predictions are crucial for both the local population and policymakers. Understanding rainfall patterns in this area is vital for effective planning and resource management. Data on atmospheric variables, including temperature, rainfall, humidity, sea level pressure, and wind speed were collected from the Bangladesh Meteorological Department for various locations across the study grids for the period of 1964 to 2015. The descriptive statistics revealed that the pattern of the data of climate parameters is not normal. This dataset serves as the foundation for analyzing climate parameters and forecasting rainfall levels within the specified regions of Bangladesh. This study evaluates machine learning techniques, focusing on artificial neural networks (ANN) and classification and regression trees, C5.0, Random Forest, and Gradient Boosting as alternatives to traditional statistical models for predicting atmospheric phenomena. It reveals that conventional models often rely on assumptions unsuitable for chaotic systems like the atmosphere. Among the assessed models ANN, CART, C5.0, Random Forest (RF), and Gradient Boosting Machines (GBM) the ANN demonstrated the highest predictive capabilities for rainfall forecasting in Bangladesh, achieving superior training accuracy and Kappa values while also being recognized as the best overall performer based on ranking metrics.
Published in | International Journal of Data Science and Analysis (Volume 10, Issue 6) |
DOI | 10.11648/j.ijdsa.20241006.11 |
Page(s) | 109-128 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2024. Published by Science Publishing Group |
ANN, Decision Tree, GBM, Cross-validation, Rainfall, Bangladesh
[1] | Ridwan, W. M., Sapitang, M., Aziz, A., Kushiar, K. F., Ahmed, A. N. and El-Shafie, A., 2021. Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Engineering Journal, 12(2), pp. 1651-1663. |
[2] | Brown, B. G. and Murphy, A. H., 1988. On the economic value of weather forecasts in wildfire suppression mobilization decisions. Canadian Journal of Forest Research, 18(12), pp. 1641-1649. |
[3] | Wilks, D. S., 1991. Representing serial correlation of meteorological events and forecasts in dynamic decision- analytic models. Monthly Weather Review, 119(7), pp. 1640-1662. |
[4] | Elsner, J. B. and Tsonis, A. A., 1992. Nonlinear prediction, chaos, and noise. Bulletin of the American Meteorological Society, 73(1), pp. 49-60. |
[5] | Cartalis, C. and Varotsos, C., 1994. Surface ozone in Athens, Greece, at the beginning and at the end of the twentieth century. Atmospheric Environment, 28(1), pp. 3-8. |
[6] | Jacovides, C. P., Varotsos, C., Kaltsounides, N. A., Petrakis, M. and Lalas, D. P., 1994. Atmospheric turbidity parameters in the highly polluted site of Athens basin. Renewable Energy, 4(5), pp. 465-470. |
[7] | Kondratyev, K. Y. and Varotsos, C. A., 2001. Global tropospheric ozone dynamics. Environmental Science and Pollution Research, 8(2), p. 113. |
[8] | Varotsos, C., Kondratyev, K. Y. and Efstathiou, M., 2001. On the seasonal variation of the surface ozone in Athens, Greece. Atmospheric Environment, 35(2), pp. 315-320. |
[9] | Sivakumar, B., Liong, S. Y., Liaw, C. Y. and Phoon, K. K., 1999. Singapore rainfall behavior: chaotic? Journal of Hydrologic Engineering, 4(1), pp. 38-48. |
[10] | Sivakumar, B., 2000. Chaos theory in hydrology: important issues and interpretations. Journal of Hydrology, 227(1-4), pp. 1-20. |
[11] | Men, B., Xiejing, Z. and Liang, C., 2004. Chaotic analysis on monthly precipitation on Hills Region in Middle Sichuan of China. Nature and Science, 2(2), pp. 45-51. |
[12] | Varotsos, C., 2005. Power-law correlations in column ozone over Antarctica. International Journal of Remote Sensing, 26(16), pp. 3333-3342. |
[13] | Saxena, A., Verma, N. and Tripathi, K.C., 2013. A review study of weather forecasting using artificial neural network approach. Int. J. Eng. Res. Technol, 2(11), pp. 2029-2036. |
[14] | Varotsos, C. and Kirk-Davidoff, D., 2006. Long-memory processes in ozone and temperature variations at the region 60 S-60 N. Atmospheric Chemistry and Physics, 6(12), pp. 4093-4100. |
[15] | Breiman, L., Friedman, J., Olshen, R., Stone, C. Steinberg, D. and Colla, P. 1983. CART: Classification and regression trees wadsworth: Belmont. |
[16] | Breiman, L., Friedman, J., Olshen, R. and Stone, C. 1984. Classification and regression trees. monterey, Calif., USA: Wadsworth. |
[17] | Petre, E. G., 2009. A decision tree for weather prediction. Bul. Univ. Pet.-Gaze din Ploiesti, 61(1), pp. 77-82. |
[18] | Kalyankar, M. A. and Alaspurkar, S. J., 2013. Data mining technique to analyse the metrological data. International Journal of Advanced Research in Computer Science and Software Engineering, 3(2). |
[19] | Kumar, R., 2013. Decision tree for the weather forecasting. International Journal of Computer Applications, 76(2), pp. 31-34. |
[20] | Hu, M. J. C., 1964. Application of the adaline system to weatherforecasting(Doctoraldissertation, Departmentof Electrical Engineering, Stanford University). |
[21] | Chattopadhyay, S. and Chattopadhyay, M., 2007. A soft computing technique in rainfall forecasting. arXiv preprint nlin/0703042. |
[22] | Kalogirou, S.A., Neocleous, C. and Michaelides, S., 1997. A time series reconstruction of precipitation records using artificial neural networks. In European Congress on Intelligent Techniques and Soft Computing. |
[23] | Wong, K. W., Wong, P. M., Gedeon, T. D. and Fung, C. C., 2003. Rainfall prediction model using soft computing technique. Soft Computing, 7, pp. 434-438. |
[24] | Michaelides, S. C., Neocleous, C. C. and Schizas, C. N., 1995. Artificial neural networks and multiple linear regression in estimating missing rainfall data. In Proceedings of the DSP95 International Conference on Digital Signal Processing, Limassol, Cyprus (pp. 668- 673). |
[25] | Badr, H. S., Zaitchik, B. F. and Guikema, S. D., 2014. Application of statistical models to the prediction of seasonal rainfall anomalies over the Sahel. Journal of Applied Meteorology and Climatology, 53(3), pp. 614- 636. |
[26] | Nelder, J. A. and Wedderburn, R. W., 1972. Generalized linear models. Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3), pp. 370-384. |
[27] | Cameron, A. C. and Trivedi, P. K., 2013. Regression analysis of count data (No. 53). Cambridge University Press. |
[28] | Hastie, T. J., 2017. Generalized additive models. In Statistical Models in S (pp. 249-307). Routledge. |
[29] | Chipman, H. A., George, E. I. and McCulloch, R.E., 2010. BART: Bayesian additive regression trees. |
[30] | Sutton, C. D., 2005. Classification and regression trees, bagging, and boosting. Handbook of statistics, 24, pp. 303-329. |
[31] | Breiman, L., 2001. Random forests. Machine learning, 45, pp. 5-32. |
[32] | Han, J., Pei, J.andTong, H., 2022. Datamining: concepts and techniques. Morgan Kaufmann. |
[33] | Breiman, L., 2017. Classification and regression trees. Routledge. |
[34] | Quinlan, J. R., 2004. Data mining tools See5 and C5.0. |
[35] | Hastie, T., Tibshirani, R. and Friedman, J. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, |
[36] | Friedman, J. H., 2001. Greedy function approximation: a gradientboostingmachine. Annalsofstatistics, pp. 1189- 1232. |
[37] | Gutierrez, D. D., 2015. Machine learning and data science: an introduction to statistical learning methods with R. Technics Publications. |
[38] | Brownlee, J., 2016. Machine learning mastery with R: Get started, build accurate models and work through projects step-by-step. Machine Learning Mastery. |
[39] | Landis, J. R. and Koch, G. G., 1977. The measurement of observer agreement for categorical data. biometrics, pp. 159-174. |
APA Style
Rahman, M. H. (2024). ANN-based and DT-based Classification Approaches to Predict the Rainfall Level of the Grid (90°E − 92°E, 23°N − 25°N) in Bangladesh. International Journal of Data Science and Analysis, 10(6), 109-128. https://doi.org/10.11648/j.ijdsa.20241006.11
ACS Style
Rahman, M. H. ANN-based and DT-based Classification Approaches to Predict the Rainfall Level of the Grid (90°E − 92°E, 23°N − 25°N) in Bangladesh. Int. J. Data Sci. Anal. 2024, 10(6), 109-128. doi: 10.11648/j.ijdsa.20241006.11
@article{10.11648/j.ijdsa.20241006.11, author = {Md. Habibur Rahman}, title = {ANN-based and DT-based Classification Approaches to Predict the Rainfall Level of the Grid (90°E − 92°E, 23°N − 25°N) in Bangladesh}, journal = {International Journal of Data Science and Analysis}, volume = {10}, number = {6}, pages = {109-128}, doi = {10.11648/j.ijdsa.20241006.11}, url = {https://doi.org/10.11648/j.ijdsa.20241006.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20241006.11}, abstract = {The study area defined by the coordinates (90°E − 92°E, 23°N − 25°N) is a significant region in Bangladesh, where accurate rainfall predictions are crucial for both the local population and policymakers. Understanding rainfall patterns in this area is vital for effective planning and resource management. Data on atmospheric variables, including temperature, rainfall, humidity, sea level pressure, and wind speed were collected from the Bangladesh Meteorological Department for various locations across the study grids for the period of 1964 to 2015. The descriptive statistics revealed that the pattern of the data of climate parameters is not normal. This dataset serves as the foundation for analyzing climate parameters and forecasting rainfall levels within the specified regions of Bangladesh. This study evaluates machine learning techniques, focusing on artificial neural networks (ANN) and classification and regression trees, C5.0, Random Forest, and Gradient Boosting as alternatives to traditional statistical models for predicting atmospheric phenomena. It reveals that conventional models often rely on assumptions unsuitable for chaotic systems like the atmosphere. Among the assessed models ANN, CART, C5.0, Random Forest (RF), and Gradient Boosting Machines (GBM) the ANN demonstrated the highest predictive capabilities for rainfall forecasting in Bangladesh, achieving superior training accuracy and Kappa values while also being recognized as the best overall performer based on ranking metrics.}, year = {2024} }
TY - JOUR T1 - ANN-based and DT-based Classification Approaches to Predict the Rainfall Level of the Grid (90°E − 92°E, 23°N − 25°N) in Bangladesh AU - Md. Habibur Rahman Y1 - 2024/12/18 PY - 2024 N1 - https://doi.org/10.11648/j.ijdsa.20241006.11 DO - 10.11648/j.ijdsa.20241006.11 T2 - International Journal of Data Science and Analysis JF - International Journal of Data Science and Analysis JO - International Journal of Data Science and Analysis SP - 109 EP - 128 PB - Science Publishing Group SN - 2575-1891 UR - https://doi.org/10.11648/j.ijdsa.20241006.11 AB - The study area defined by the coordinates (90°E − 92°E, 23°N − 25°N) is a significant region in Bangladesh, where accurate rainfall predictions are crucial for both the local population and policymakers. Understanding rainfall patterns in this area is vital for effective planning and resource management. Data on atmospheric variables, including temperature, rainfall, humidity, sea level pressure, and wind speed were collected from the Bangladesh Meteorological Department for various locations across the study grids for the period of 1964 to 2015. The descriptive statistics revealed that the pattern of the data of climate parameters is not normal. This dataset serves as the foundation for analyzing climate parameters and forecasting rainfall levels within the specified regions of Bangladesh. This study evaluates machine learning techniques, focusing on artificial neural networks (ANN) and classification and regression trees, C5.0, Random Forest, and Gradient Boosting as alternatives to traditional statistical models for predicting atmospheric phenomena. It reveals that conventional models often rely on assumptions unsuitable for chaotic systems like the atmosphere. Among the assessed models ANN, CART, C5.0, Random Forest (RF), and Gradient Boosting Machines (GBM) the ANN demonstrated the highest predictive capabilities for rainfall forecasting in Bangladesh, achieving superior training accuracy and Kappa values while also being recognized as the best overall performer based on ranking metrics. VL - 10 IS - 6 ER -