Treffer: Does Reinforcement Learning Improve Outcomes for Critically Ill Patients? A Systematic Review and Level-of-Readiness Assessment.
Original Publication: New York, Kolen.
Jumper J, Evans R, Pritzel A, et al.: Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589.
Levine S, Kumar A, Tucker G, et al.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020. Available at: https://arxiv.org/abs/2005.01643 . Accessed August 17, 2023.
Sutton RS, Barto AG: Reinforcement Learning: An Introduction. Second Edition. Cambridge, MA, US, The MIT Press, 2018.
François-Lavet V, Henderson P, Islam R, et al.: An introduction to deep reinforcement learning. Found Trends® Mach Learn. 2018; 11:219–354.
Liu S, See KC, Ngiam KY, et al.: Reinforcement learning for clinical decision support in critical care: Comprehensive review. J Med Internet Res. 2020; 22:e18477.
Girbes ARJ, de Grooth H-J: Time to stop randomized and large pragmatic trials for intensive care medicine syndromes: The case of sepsis and acute respiratory distress syndrome. J Thorac Dis. 2020; 12:S101–S109.
Johnson AEW, Pollard TJ, Shen L, et al.: MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.
Thoral PJ, Peppink JM, Driessen RH, et al.; Amsterdam University Medical Centers Database (AmsterdamUMCdb) Collaborators and the SCCM/ESICM Joint Data Science Task Force: Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med. 2021; 49:e563–e577.
Sauer CM, Dam TA, Celi LA, et al.: Systematic review and comparison of publicly available ICU data sets—a decision guide for clinicians and data scientists. Crit Care Med. 2022; 50:e581–e588.
Page MJ, McKenzie JE, Bossuyt PM, et al.: The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ. 2021; 372:n71.
Fleuren LM, Thoral P, Shillan D, et al.; Right Data Right Now Collaborators: Machine learning in intensive care medicine: Ready for take-off? Intensive Care Med. 2020; 46:1486–1488.
Grames EM, Stillman AN, Tingley MW, et al.: An automated approach to identifying search terms for systematic reviews using keyword co-occurrence networks. Methods Ecol Evol. 2019; 10:1645–1654.
Haddaway NR, Grainger MJ, Gray CT: Citationchaser: A tool for transparent and efficient forward and backward citation chasing in systematic searching. Res Synth Methods. 2022; 13:533–545.
Ouzzani M, Hammady H, Fedorowicz Z, et al.: Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016; 5:210.
Plaat A: Deep reinforcement learning. Singapore, Springer Nature Singapore, 2022. Available at: https://link.springer.com/10.1007/978-981-19-0638-1 . Accessed January 18, 2023.
Schünemann H, Brożek J, Guyatt G, et al.: GRADE Handbook, 2013. Available at: https://training.cochrane.org/resource/grade-handbook . Accessed October 30, 2023.
Wolff RF, Moons KGM, Riley RD, et al.; PROBAST Group†: PROBAST: A tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019; 170:51–58.
Eghbali N, Alhanai T, Ghassemi MM: Patient-specific sedation management via deep reinforcement learning. Front Digit Health. 2021; 3:608893.
Guo H, Li J, Liu H, et al.: Learning dynamic treatment strategies for coronary heart diseases by artificial intelligence: Real-world data-driven study. BMC Med Inform Decis Mak. 2022; 22:39.
Komorowski M, Celi LA, Badawi O, et al.: The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. 2018; 24:1716–1720.
Peine A, Hallawa A, Bickenbach J, et al.: Development and validation of a reinforcement learning algorithm to dynamically optimize mechanical ventilation in critical care. Npj Digit Med. 2021; 4:1–12.
Qiu X, Tan X, Li Q, et al.: A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support. Knowledge Based Syst. 2022; 237:107689.
Roggeveen L, el Hassouni A, Ahrendt J, et al.: Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artif Intell Med. 2021; 112:102003.
Zheng H, Zhu J, Xie W, et al.: Reinforcement learning assisted oxygen therapy for COVID-19 patients under intensive care. BMC Med Inform Decis Mak. 2021; 21:350.
Zhu S, Pu J: A self-supervised method for treatment recommendation in sepsis. Front Inf Technol Electron Eng. 2021; 22:926–939.
Lin R, Stanley MD, Ghassemi MM, et al.: A deep deterministic policy gradient approach to medication dosing and surveillance in the ICU. Annu Int Conf IEEE Eng Med Biol Soc. 2018; 2018:4927–4931.
Futoma J, Masood MA, Doshi-Velez F: Identifying distinct, effective treatments for acute hypotension with SODA-RL: Safely optimized diverse accurate reinforcement learning. AMIA Jt Summits Transl Sci Proc. 2020; 2020:181–190.
Lopez-Martinez D, Eschenfeldt P, Ostvar S, et al.: Deep reinforcement learning for optimal critical care pain management with morphine using dueling double-deep Q networks. Annu Int Conf IEEE Eng Med Biol Soc. 2019; 2019:3960–3963.
Nemati S, Ghassemi MM, Clifford GD: Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. Annu Int Conf IEEE Eng Med Biol Soc. 2016; 2016:2978–2981.
Prasad N, Cheng L-F, Chivers C, et al.: A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. In: Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI, Sydney, Australia, August 11-15-2017. AUAI Press, 2017.
Utomo CP, Kurniawati H, Li X, et al.: Personalised medicine in critical care using bayesian reinforcement learning. In: Advanced Data Mining and Applications. Li J, Wang S, Qin S, et al (Eds). Cham, Springer, 2019, pp 648–657.
Peng X, Ding Y, Wihl D, et al.: Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning, 2019. Available at: http://arxiv.org/abs/1901.04670 . Accessed January 8, 2023.
Tsoukalas A, Albertson T, Tagkopoulos I: From data to optimal decision making: A data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Med Inform. 2015; 3:e11.
Sun C, Hong S, Song M, et al.: Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience. Biomed Signal Proc Control. 2021; 69:102847.
Nanayakkara T, Clermont G, Langmead CJ, et al.: Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment. PLOS Digit Health. 2022; 1:e0000012.
Ma P, Liu J, Shen F, et al.: Individualized resuscitation strategy for septic shock formalized by finite mixture modeling and dynamic treatment regimen. Crit Care. 2021; 25:243.
Li T, Wang Z, Lu W, et al.: Electronic health records based reinforcement learning for treatment optimizing. Inf Syst. 2022; 104:101878.
Liang D, Deng H, Liu Y: The treatment of sepsis: An episodic memory-assisted deep reinforcement learning approach. Appl Intell. 2022; 53:11034–11044.
Baucum M, Khojandi A, Vasudevan R, et al.: Adapting reinforcement learning treatment policies using limited data to personalize critical care. INFORMS J Data Sci. 2022; 1:27–49.
Su L, Li Y, Liu S, et al.: Establishment and implementation of potential fluid therapy balance strategies for ICU sepsis patients based on reinforcement learning. Front Med. 2022; 9:766447.
Chen S, Qiu X, Tan X, et al.: A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings. Inf Sci. 2022; 611:47–64.
Festor P, Jia Y, Gordon AC, et al.: Assuring the safety of AI-based clinical decision support systems: A case study of the AI clinician for sepsis treatment. BMJ Health Care Inform. 2022; 29:e100549.
Raghu A, Komorowski M, Celi LA, et al.: Continuous state-space models for optimal sepsis treatment: A deep reinforcement learning approach. In: Proceedings of the 2nd Machine Learning for Healthcare Conference. PMLR, 2017, pp 147–163.
Satija H, Thomas PS, Pineau J, et al.: Multi-objective SPIBB: Seldonian offline policy improvement with safety constraints in finite MDPs. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021, pp 2004–2017. Available at: https://proceedings.neurips.cc/paper/2021/hash/0f65caf0a7d00afd2b87c028e88fe931-Abstract.html.
Raghu A: Reinforcement learning for sepsis treatment: Baselines and analysis. Reinforcement Learning for Real Life, Workshop in the 36 th International Conference on Machine Learning, Long Beach, California, USA, 2018, 2019. Available at: https://openreview.net/forum?id=BJekwh0ToN.
Li L, Albert-Smet I, Faisal AA: Optimizing medical treatment for sepsis in intensive care: From reinforcement learning to pre-trial evaluation, 2020. Available at: http://arxiv.org/abs/2003.06474 . Accessed January 8, 2023.
Baucum M, Khojandi A, Vasudevan R: Improving deep reinforcement learning with transitional variational autoencoders: A healthcare application. IEEE J Biomed Health Inform. 2021; 25:2273–2280.
Futoma J, Hughes M, Doshi-Velez F: POPCORN: Partially observed prediction constrained reinforcement learning. In: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. PMLR, 2020. pp 3578–3588. Available at: https://proceedings.mlr.press/v108/futoma20a.html . Accessed January 8, 2023.
Jia Y, Burden J, Lawton T, et al.: Safe reinforcement learning for sepsis treatment. In: 2020 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 2020. pp 1–7.
Killian TW, Zhang H, Subramanian J, et al.: An empirical study of representation learning for reinforcement learning in healthcare. In: Proceedings of the Machine Learning for Health NeurIPS Workshop. PMLR, 2020, pp 139–160. Available at: https://proceedings.mlr.press/v136/killian20a.html . Accessed January 8, 2023.
Tang S, Modi A, Sjoding M, et al.: Clinician-in-the-loop decision making: Reinforcement learning with near-optimal set-valued policies. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020, pp 9387–9396. Available at: https://proceedings.mlr.press/v119/tang20c.html . Accessed January 8, 2023.
Wang L, Yu W, He X, et al.: Adversarial cooperative imitation learning for dynamic treatment regimes✱. In: Proceedings of The Web Conference 2020. New York, NY, USA, Association for Computing Machinery, 2020, pp 1785–1795. Available at: https://doi.org/10.1145/3366423.3380248 . Accessed January 8, 2023.
Liu X, Yu C, Huang Q, et al.: Combining model-based and model-free reinforcement learning policies for more efficient sepsis treatment. In: Bioinformatics Research and Applications. Wei Y, Li M, Skums P, et al (Eds). Cham, Springer International Publishing, 2021, pp 105–117.
den Hengst F, Grua EM, el Hassouni A, et al.: Reinforcement learning for personalization: A systematic literature review. Data Sci. 2020; 3:107–147.
van de Sande D, van Genderen ME, Huiskens J, et al.: Moving from bytes to bedside: A systematic review on the use of artificial intelligence in the intensive care unit. Intensive Care Med. 2021; 47:750–760.
Gottesman O, Johansson F, Meier J, et al.: Evaluating reinforcement learning algorithms in observational health settings, 2018. Available at: http://arxiv.org/abs/1805.12298 . Accessed January 4, 2023.
Lu M, Shahn Z, Sow D, et al.: Is deep reinforcement learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. AMIA Annu Symp Proc. 2021; 2020:773–782.
Charpignon M-L, Byers J, Cabral S, et al.: Critical bias in critical care devices. Crit Care Clin. 2023; 39:795–813.
Romanowski B, Ben Abacha A, Fan Y: Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches. J Am Med Inform Assoc. 2023; 30:1448–1455.
Futoma J, Simons M, Panch T, et al.: The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health. 2020; 2:e489–e492.
Gottesman O, Johansson F, Komorowski M, et al.: Guidelines for reinforcement learning in healthcare. Nat Med. 2019; 25:16–18.
Weitere Informationen
Objective: Reinforcement learning (RL) is a machine learning technique uniquely effective at sequential decision-making, which makes it potentially relevant to ICU treatment challenges. We set out to systematically review, assess level-of-readiness and meta-analyze the effect of RL on outcomes for critically ill patients.
Data Sources: A systematic search was performed in PubMed, Embase.com, Clarivate Analytics/Web of Science Core Collection, Elsevier/SCOPUS and the Institute of Electrical and Electronics Engineers Xplore Digital Library from inception to March 25, 2022, with subsequent citation tracking.
Data Extraction: Journal articles that used an RL technique in an ICU population and reported on patient health-related outcomes were included for full analysis. Conference papers were included for level-of-readiness assessment only. Descriptive statistics, characteristics of the models, outcome compared with clinician's policy and level-of-readiness were collected. RL-health risk of bias and applicability assessment was performed.
Data Synthesis: A total of 1,033 articles were screened, of which 18 journal articles and 18 conference papers, were included. Thirty of those were prototyping or modeling articles and six were validation articles. All articles reported RL algorithms to outperform clinical decision-making by ICU professionals, but only in retrospective data. The modeling techniques for the state-space, action-space, reward function, RL model training, and evaluation varied widely. The risk of bias was high in all articles, mainly due to the evaluation procedure.
Conclusion: In this first systematic review on the application of RL in intensive care medicine we found no studies that demonstrated improved patient outcomes from RL-based technologies. All studies reported that RL-agent policies outperformed clinician policies, but such assessments were all based on retrospective off-policy evaluation.
(Copyright © 2023 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.)
Dr. Dam’s institution received funding from ZonMW/Netherlands Organization for Health Research and Development (10430012010003); he received funding from Pacmed BV. Dr. Hengst received funding from ING Bank N.V. Dr. Hoogendoorn disclosed co-ownership of PersonalAIze B.V. The remaining authors have disclosed that they do not have any potential conflicts of interest.