当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2024-03-24 , DOI: 10.1016/j.eswa.2024.123686
Mathieu Reymond , Conor F. Hayes , Lander Willem , Roxana Rădulescu , Steven Abrams , Diederik M. Roijers , Enda Howley , Patrick Mannion , Niel Hens , Ann Nowé , Pieter Libin

Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision-making in the context of epidemic mitigation is multi-dimensional hence complex, reinforcement learning in combination with complex epidemic models provides a methodology to design refined prevention strategies. Current research focuses on optimizing policies with respect to a single objective, such as the pathogen’s attack rate. However, as the mitigation of epidemics involves distinct, and possibly conflicting, criteria (i.a., mortality, morbidity, economic cost, well-being), a multi-objective decision approach is warranted to obtain balanced policies. To enhance future decision-making, we propose a deep multi-objective reinforcement learning approach by building upon a state-of-the-art algorithm called Pareto Conditioned Networks (PCN) to obtain a set of solutions for distinct outcomes of the decision problem. We consider different deconfinement strategies after the first Belgian lockdown within the COVID-19 pandemic and aim to minimize both COVID-19 cases (i.e., infections and hospitalizations) and the societal burden induced by the mitigation measures. As such, we connected a multi-objective Markov decision process with a stochastic compartment model designed to approximate the Belgian COVID-19 waves and explore reactive strategies. As these social mitigation measures are implemented in a continuous action space that modulates the contact matrix of the age-structured epidemic model, we extend PCN to this setting. We evaluate the solution set that PCN returns, and observe that it explored the whole range of possible social restrictions, leading to high-quality trade-offs, as it captured the problem dynamics. In this work, we demonstrate that multi-objective reinforcement learning adds value to epidemiological modeling and provides essential insights to balance mitigation policies.

中文翻译:

使用强化学习探索多目标 COVID-19 缓解政策的帕累托前沿

传染病的爆发可能对公共卫生和社会进程产生破坏性影响。由于疫情缓解背景下的决策是多维度的、复杂的,强化学习与复杂的疫情模型相结合,为设计精细化的预防策略提供了方法论。目前的研究重点是优化针对单一目标的政策,例如病原体的攻击率。然而,由于流行病的缓解涉及不同的、可能相互冲突的标准(例如死亡率、发病率、经济成本、福祉),因此需要采取多目标决策方法来获得平衡的政策。为了增强未来的决策能力,我们提出了一种深度多目标强化学习方法,该方法基于称为帕累托条件网络(PCN)的最先进算法,以获得针对决策问题的不同结果的一组解决方案。在 COVID-19 大流行期间比利时首次封锁后,我们考虑了不同的解除限制策略,旨在最大限度地减少 COVID-19 病例(即感染和住院治疗)以及缓解措施造成的社会负担。因此,我们将多目标马尔可夫决策过程与随机舱室模型联系起来,该模型旨在近似比利时的 COVID-19 浪潮并探索反应策略。由于这些社会缓解措施是在调节年龄结构流行病模型的接触矩阵的连续行动空间中实施的,因此我们将 PCN 扩展到这种设置。我们评估 PCN 返回的解决方案集,并观察到它探索了所有可能的社会限制,导致高质量的权衡,因为它捕获了问题动态。在这项工作中,我们证明多目标强化学习为流行病学建模增加了价值,并为平衡缓解政策提供了重要的见解。
更新日期:2024-03-24
down
wechat
bug