Treffer: Global or local modeling for XGBoost in geospatial studies upon simulated data and German COVID-19 infection forecasting.

Title:
Global or local modeling for XGBoost in geospatial studies upon simulated data and German COVID-19 infection forecasting.
Authors:
Cheng X; Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, 10587, Berlin, Germany. ximeng.cheng@hhi.fraunhofer.de., Ma J; Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, 10587, Berlin, Germany.
Source:
Scientific reports [Sci Rep] 2025 Mar 14; Vol. 15 (1), pp. 8858. Date of Electronic Publication: 2025 Mar 14.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
Imprint Name(s):
Original Publication: London : Nature Publishing Group, copyright 2011-
References:
Proc Natl Acad Sci U S A. 2021 Aug 3;118(31):. (PMID: 34261775)
BMC Public Health. 2021 Dec 8;21(1):2227. (PMID: 34876066)
Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442)
Neural Netw. 2021 Jul;139:294-304. (PMID: 33866128)
Int J Environ Res Public Health. 2022 Apr 22;19(9):. (PMID: 35564487)
Biometrika. 1950 Jun;37(1-2):17-23. (PMID: 15420245)
Neural Comput. 1997 Nov 15;9(8):1735-80. (PMID: 9377276)
Chaos Solitons Fractals. 2020 Oct;139:110059. (PMID: 32834612)
Diagnostics (Basel). 2021 Jun 24;11(7):. (PMID: 34202587)
Neural Comput Appl. 2022;34(4):3135-3149. (PMID: 34658536)
Nature. 2019 Feb;566(7743):195-204. (PMID: 30760912)
Arch Comput Methods Eng. 2021;28(4):2645-2653. (PMID: 32837183)
Results Phys. 2021 Aug;27:104462. (PMID: 34178594)
Sci Rep. 2023 Jun 19;13(1):9940. (PMID: 37336995)
Chaos Solitons Fractals. 2020 Nov;140:110121. (PMID: 32834633)
Contributed Indexing:
Keywords: Machine learning; Spatial heterogeneity; Spatial partitioning; Spatial variation; Time-series forecasting
Entry Date(s):
Date Created: 20250315 Date Completed: 20250513 Latest Revision: 20250514
Update Code:
20250515
PubMed Central ID:
PMC11909275
DOI:
10.1038/s41598-025-92995-6
PMID:
40087346
Database:
MEDLINE

Weitere Informationen

Methods from artificial intelligence (AI) and, in particular, machine learning and deep learning, have advanced rapidly in recent years and have been applied to multiple fields including geospatial analysis. Due to the spatial heterogeneity and the fact that conventional methods can not mine large data, geospatial studies typically model homogeneous regions locally within the entire study area. However, AI models can process large amounts of data, and, theoretically, the more diverse the train data, the more robust a well-trained model will be. In this paper, we study a typical machine learning method XGBoost, with the question: Is it better to build a single global or multiple local models for XGBoost in geospatial studies? To compare the global and local modeling, XGBoost is first studied on simulated data and then also studied to forecast daily infection cases of COVID-19 in Germany. The results indicate that if the data under different relationships between independent and dependent variables are balanced and the corresponding value ranges are similar, i.e., low spatial variation, global modeling of XGBoost is better for most cases; otherwise, local modeling of XGBoost is more stable and better, especially for the secondary data. Besides, local modeling has the potential of using parallel computing because each sub-model is trained independently, but the spatial partition of local modeling requires extra attention and can affect results.
(© 2025. The Author(s).)

Declarations. Competing interests: The authors declare no competing interests.