Treffer: A study on time-series prediction and analysis of acidity of Daqu based on multivariate data fusion and KNN-Attention-LSTM-XGBoost modeling.
Hu P, Wang J, Ali U et al (2023) Comparative study on physicochemical properties, microbial composition, and the volatile component of different light flavor Daqu. Food Sci Nutr 11(9):5174–5187. https://doi.org/10.1002/fsn3.3476. (PMID: 10.1002/fsn3.34763770118610494650)
Liu Y, Li H, Liu W et al (2024) Bioturbation analysis of microbial communities and flavor metabolism in a high-yielding cellulase Bacillus subtilis biofortified Daqu. Food Chemistry: X 22:101382. https://doi.org/10.1016/j.fochx.2024.101382. (PMID: 10.1016/j.fochx.2024.10138238665634)
Volmer DA, Curbani L, Parker TA et al (2017) Determination of titratable acidity in wine using potentiometric, conductometric, and photometric methods. J Chem Educ 94(9):1296–1302. https://doi.org/10.1021/acs.jchemed.6b00891. (PMID: 10.1021/acs.jchemed.6b00891)
Zakharova EA, Moskaleva ML, Akeneev YA et al (2011) Potentiometric determination of the total acidity and concentration of citric acid in wines. J Anal Chem 66:848–853. https://doi.org/10.1134/S1061934811090218. (PMID: 10.1134/S1061934811090218)
Balázs N, Sipos P (2007) Limitations of pH-potentiometric titration for the determination of the degree of deacetylation of chitosan. Carbohyd Res 342(1):124–130. https://doi.org/10.1016/j.carres.2006.11.016. (PMID: 10.1016/j.carres.2006.11.016)
Xing S, Shi G, Lu J et al (2025) The discrepancy in amino acids within high-temperature Daqu: a novel metabolic marker for the quality evaluation of Daqu. Food Chem 470:142645. https://doi.org/10.1016/j.foodchem.2024.142645. (PMID: 10.1016/j.foodchem.2024.14264539752740)
He G, Xie F, Ren X et al (2023) Revealing the functional microbiota for acetic acid formation in Daqu starter for Chinese Nong-Xiang Baijiu fermentation. Food Biosci 53:102782. https://doi.org/10.1016/j.fbio.2023.102782. (PMID: 10.1016/j.fbio.2023.102782)
Wang Y, Gai J, Hou Q et al (2023) Ultra-high-depth macrogenomic sequencing revealed differences in microbial composition and function between high temperature and medium–high temperature Daqu. World J Microbiol Biotechnol 39(12):337. https://doi.org/10.1007/s11274-023-03772-4. (PMID: 10.1007/s11274-023-03772-437814055)
Zhu C, Cheng Y, Zuo Q et al (2022) Exploring the impacts of traditional crafts on microbial community succession in Jiang-flavored Daqu. Food Res Int 158:111568. https://doi.org/10.1016/j.foodres.2022.111568. (PMID: 10.1016/j.foodres.2022.11156835840256)
Wu M, Luo Y, Yao Y et al (2024) Multidimensional analysis of wheat original crucial endogenous enzymes driving microbial communities metabolism during high-temperature Daqu fermentation. Int J Food Microbiol 413:110589. https://doi.org/10.1016/j.ijfoodmicro.2024.110589. (PMID: 10.1016/j.ijfoodmicro.2024.11058938281434)
Jie D, Zheng J, Li X et al (2024) Characterization of Qu-aroma of medium-high temperature Daqu from different production areas using sensory evaluation E-nose, and GC-MS/O Analysis. Bioresources Bioprocess. https://doi.org/10.21203/rs.3.rs-5354492/v1. (PMID: 10.21203/rs.3.rs-5354492/v1)
Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Mark 31(3):685–695. https://doi.org/10.1007/s12525-021-00475-2. (PMID: 10.1007/s12525-021-00475-2)
Jiang X, Hu X, Huang H et al (2021) Detecting total acid content quickly and accurately by combining hyperspectral imaging and an optimized algorithm method. J Food Process Eng 44(11):e13844. https://doi.org/10.1111/jfpe.13844. (PMID: 10.1111/jfpe.13844)
Huang H, Hu X, Tian J et al (2021) Rapid detection of the reducing sugar and amino acid nitrogen contents of Daqu based on hyperspectral imaging. J Food Compos Anal 101:103970. https://doi.org/10.1016/j.jfca.2021.103970. (PMID: 10.1016/j.jfca.2021.103970)
Chauchard F, Cogdill R, Roussel S et al (2004) Application of LS-SVM to non-linear phenomena in NIR spectroscopy: development of a robust and portable sensor for acidity prediction in grapes. Chemom Intell Lab Syst 71(2):141–150. https://doi.org/10.1016/j.chemolab.2004.01.003. (PMID: 10.1016/j.chemolab.2004.01.003)
Liu Y, Wang H, Fei Y et al (2021) Research on the prediction of green plum acidity based on improved XGBoost. Sensors 21(3):930. https://doi.org/10.3390/s21030930. (PMID: 10.3390/s21030930335732497866513)
Liang J, Zhu J, Gong L et al (2018) Potentiometric titration for the high precision determination of active components in six types of chemical disinfectants. PLoS One 13(9):e0203558. https://doi.org/10.1371/journal.pone.0203558. (PMID: 10.1371/journal.pone.0203558301928446128583)
Jadhav A, Pramod D, Ramanathan K (2019) Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell 33(10):913–933. https://doi.org/10.1080/08839514.2019.1637138. (PMID: 10.1080/08839514.2019.1637138)
Sun Y, Li J, Xu Y et al (2023) Deep learning versus conventional methods for missing data imputation: a review and comparative study. Expert Syst Appl 227:120201. https://doi.org/10.1016/j.eswa.2023.120201. (PMID: 10.1016/j.eswa.2023.120201)
Dash CSK, Behera AK, Dehuri S et al (2023) An outliers detection and elimination framework in classification task of data mining. Decis Anal J 6:100164. https://doi.org/10.1016/j.dajour.2023.100164. (PMID: 10.1016/j.dajour.2023.100164)
Noor NM, Al Bakri Abdullah MM, Yahaya AS et al (2015) Comparison of linear interpolation method and mean method to replace the missing values in environmental data set. Mater Sci Forum 803:278–281. https://doi.org/10.4028/www.scientific.net/MSF.803.278. (PMID: 10.4028/www.scientific.net/MSF.803.278)
Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. Information Computing and Applications:246–252. https://doi.org/10.1007/978-3-642-34062-8_32.
Sekulić A, Kilibarda M, Heuvelink G, Nikolić M, Bajat B (2020) Random forest spatial interpolation. Remote Sens 12(10):1687. https://doi.org/10.3390/rs12101687. (PMID: 10.3390/rs12101687)
Zhang S, Cheng D, Deng Z, Zong M, Deng X (2018) A novel kNN algorithm with data-driven k parameter computation. Pattern Recogn Lett 109:44–54. https://doi.org/10.1016/j.patrec.2017.09.036. (PMID: 10.1016/j.patrec.2017.09.036)
Cheng JH, Sun DW (2017) Partial least squares regression (PLSR) applied to NIR and HSI spectral data modeling to predict chemical properties of fish muscle. Food Eng Rev 9:36–49. https://doi.org/10.1007/s12393-016-9147-1. (PMID: 10.1007/s12393-016-9147-1)
Ahlgren P, Jarneving B, Rousseau R (2003) Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. J Am Soc Inform Sci Technol 54(6):550–560. https://doi.org/10.1002/asi.10242. (PMID: 10.1002/asi.10242)
Widodo A, Budi I, Widjaja B (2016) Automatic lag selection in time series forecasting using multiple kernel learning. Int J Mach Learn Cybern 7:95–110. https://doi.org/10.1007/s13042-015-0409-7. (PMID: 10.1007/s13042-015-0409-7)
Zivot E, Wang J, Zivot E, Wang J (2003) Rolling analysis of time series. Model Financial Time Series S-Plus. https://doi.org/10.1007/978-0-387-21763-5_9. (PMID: 10.1007/978-0-387-21763-5_9)
Henderi H, Wahyuningsih T, Rahwanto E (2021) Comparison of min-max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) algorithm to test the accuracy of types of breast cancer. Int J Informatics Inf Syst 4(1):13–20. https://doi.org/10.4738/ijiis.v4i1.73. (PMID: 10.4738/ijiis.v4i1.73)
Morais CLM, Santos MCD, Lima KMG et al (2019) Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach. Bioinformatics 35(24):5257–5263. https://doi.org/10.1093/bioinformatics/btz421. (PMID: 10.1093/bioinformatics/btz421311163916954661)
Mo H, Sun H, Liu J, Wei S (2019) Developing window behavior models for residential buildings using XGBoost algorithm. Energy Build 205:109564. https://doi.org/10.1016/j.enbuild.2019.109564. (PMID: 10.1016/j.enbuild.2019.109564)
Hu CA, Chen CM, Fang YC et al (2020) Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan. BMJ Open 10(2):e033898. https://doi.org/10.1136/bmjopen-2019-033898. (PMID: 10.1136/bmjopen-2019-033898321028167045134)
Oukhouya H, Kadiri H, El Himdi K et al (2024) Forecasting international stock market trends: XGBoost, LSTM, LSTM-XGBoost, and Backtesting XGBoost models. Stat Optim Inf Comput 12(1):200–209. https://doi.org/10.19139/soic-2310-5070-1822. (PMID: 10.19139/soic-2310-5070-1822)
Zhang X, Zhang Q (2020) Short-term traffic flow prediction based on LSTM-XGBoost combination model. Comput Model Eng Sci 125(1):95–109. https://doi.org/10.32604/cmes.2020.011013. (PMID: 10.32604/cmes.2020.011013)
Din R U, Ahmed S, Khan S H (2024) A Novel Decision Ensemble Framework: Customized Attention-BiLSTM and XGBoost for Speculative Stock Price Forecasting. arXiv preprint arXiv:2401.11621 . https://doi.org/10.48550/arXiv.2401.11621.
Lu L, Shen Z, Li S, Wang W (2024) Leveraging Hybrid LSTM-Attention-CatBoost Model for Complex Energy Behavior Forecasting. In 2024 3rd International Conference on Smart Grids and Energy Systems (SGES): 338–342. https://doi.org/10.1109/SGES63808.2024.10824187.
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining: 785–794. https://doi.org/10.1145/2939672.2939785.
Ogunleye A, Wang QG (2019) XGBoost model for chronic kidney disease diagnosis. IEEE/ACM Trans Comput Biol Bioinf 17(6):2131–2140. https://doi.org/10.1109/TCBB.2019.2911071. (PMID: 10.1109/TCBB.2019.2911071)
Wen X, Li W (2023) Time series prediction based on LSTM-attention-LSTM model. IEEE access 11:48322–48331. https://doi.org/10.1109/ACCESS.2023.3276628. (PMID: 10.1109/ACCESS.2023.3276628)
Kim S, Kang M (2019) Financial series prediction using Attention LSTM. arXiv preprint arXiv:1902.10877 . https://doi.org/10.48550/arXiv.1902.10877.
Luo J, Zhang Z, Fu Y et al (2021) Time series prediction of COVID-19 transmission in America using LSTM and XGBoost algorithms. Res Phys 27:104462. https://doi.org/10.1016/j.rinp.2021.104462. (PMID: 10.1016/j.rinp.2021.104462)
Huang Y, Tian J, Yang H et al (2024) Detection of wheat saccharification power and protein content using stacked models integrated with hyperspectral imaging. J Sci Food Agric 104(7):4145–4156. https://doi.org/10.1002/jsfa.13296. (PMID: 10.1002/jsfa.1329638294322)
Jiang X, Tian J, Huang H et al (2022) Nondestructive visualization and quantification of total acid and reducing sugar contents in fermented grains by combining spectral and color data through hyperspectral imaging. Food Chem 386:132779. https://doi.org/10.1016/j.foodchem.2022.132779. (PMID: 10.1016/j.foodchem.2022.13277935349904)
Weitere Informationen
Daqu is a traditional Chinese brewing ingredient that serves dual functions of saccharification and fermentation during the brewing process. The acidity content during the Daqu fermentation process directly affects the quality of the Daqu. Traditional methods for measuring Daqu acidity are complex and exhibit lag, making it difficult to monitor fermentation acidity in real time. Given the strong correlation between Daqu acidity and environmental variables, this paper proposes a time series prediction model for Daqu acidity based on the KNN-Attention-LSTM-XGBoost model. Upon collecting and analyzing the microenvironmental parameters of Daqu, the XGBoost model was used to select two optimal imputation methods (LFBI and KNN). Partial Least Squares Regression (PLSR) was employed to extract key parameters, and feature extraction using the lag and rolling window methods was performed to capture temporal trends and fluctuations. Comparative analysis revealed that KNN preprocessing combined with the Attention-LSTM-XGBoost model performed best in predicting Daqu acidity, with R <sup>2</sup> values reaching 0.9790, 0.9768, and 0.9636 for the upper, middle, and lower Daqu layers, respectively. This combination outperformed the LSTM-XGBoost and XGBoost models, with improvements of 3.87%, 1.11%, and 2.84% compared to LSTM-XGBoost, and 4.70%, 4.37%, and 8.46% compared to XGBoost. This study addresses the challenge of predicting Daqu acidity during fermentation and provides insights into the optimization of the Daqu fermentation process.
(© 2025. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.)
Declarations. Conflict of interest: The authors declare no competing interests.