Treffer: Global or local modeling for XGBoost in geospatial studies upon simulated data and German COVID-19 infection forecasting.
BMC Public Health. 2021 Dec 8;21(1):2227. (PMID: 34876066)
Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442)
Neural Netw. 2021 Jul;139:294-304. (PMID: 33866128)
Int J Environ Res Public Health. 2022 Apr 22;19(9):. (PMID: 35564487)
Biometrika. 1950 Jun;37(1-2):17-23. (PMID: 15420245)
Neural Comput. 1997 Nov 15;9(8):1735-80. (PMID: 9377276)
Chaos Solitons Fractals. 2020 Oct;139:110059. (PMID: 32834612)
Diagnostics (Basel). 2021 Jun 24;11(7):. (PMID: 34202587)
Neural Comput Appl. 2022;34(4):3135-3149. (PMID: 34658536)
Nature. 2019 Feb;566(7743):195-204. (PMID: 30760912)
Arch Comput Methods Eng. 2021;28(4):2645-2653. (PMID: 32837183)
Results Phys. 2021 Aug;27:104462. (PMID: 34178594)
Sci Rep. 2023 Jun 19;13(1):9940. (PMID: 37336995)
Chaos Solitons Fractals. 2020 Nov;140:110121. (PMID: 32834633)
Weitere Informationen
Methods from artificial intelligence (AI) and, in particular, machine learning and deep learning, have advanced rapidly in recent years and have been applied to multiple fields including geospatial analysis. Due to the spatial heterogeneity and the fact that conventional methods can not mine large data, geospatial studies typically model homogeneous regions locally within the entire study area. However, AI models can process large amounts of data, and, theoretically, the more diverse the train data, the more robust a well-trained model will be. In this paper, we study a typical machine learning method XGBoost, with the question: Is it better to build a single global or multiple local models for XGBoost in geospatial studies? To compare the global and local modeling, XGBoost is first studied on simulated data and then also studied to forecast daily infection cases of COVID-19 in Germany. The results indicate that if the data under different relationships between independent and dependent variables are balanced and the corresponding value ranges are similar, i.e., low spatial variation, global modeling of XGBoost is better for most cases; otherwise, local modeling of XGBoost is more stable and better, especially for the secondary data. Besides, local modeling has the potential of using parallel computing because each sub-model is trained independently, but the spatial partition of local modeling requires extra attention and can affect results.
(© 2025. The Author(s).)
Declarations. Competing interests: The authors declare no competing interests.