Poissonian Two-Armed Bandit: A New Approach,Problems of Information Transmission

当前位置： X-MOL 学术 › Probl. Inf. Transm. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Poissonian Two-Armed Bandit: A New Approach
Problems of Information Transmission ( IF 1.2 ) Pub Date : 2022-07-11 , DOI: 10.1134/s0032946022020065
A. V. Kolnogorov

We consider a new approach to the continuous-time two-armed bandit problem in which incomes are described by Poisson processes. For this purpose, first, the control horizon is divided into equal consecutive half-intervals in which the strategy remains constant, and the incomes arrive in batches corresponding to these half-intervals. For finding the optimal piecewise constant Bayesian strategy and its corresponding Bayesian risk, a recursive difference equation is derived. The existence of a limiting value of the Bayesian risk when the number of half-intervals grows infinitely is established, and a partial differential equation for finding it is derived. Second, unlike previously considered settings of this problem, we analyze the strategy as a function of the current history of the controlled process rather than of the evolution of the posterior distribution. This removes the requirement of finiteness of the set of admissible parameters, which was imposed in previous settings. Simulation shows that in order to find the Bayesian and minimax strategies and risks in practice, it is sufficient to partition the arriving incomes into 30 batches. In the case of the minimax setting, it is shown that optimal processing of arriving incomes one by one is not more efficient than optimal batch processing if the control horizon grows infinitely.

中文翻译：

泊松双臂强盗：一种新方法

我们考虑一种解决连续时间双臂老虎机问题的新方法，其中收入由泊松过程描述。为此，首先，将控制范围划分为策略保持不变的相等的连续半区间，收益分批到达对应于这些半区间的批次。为了找到最佳分段常数贝叶斯策略及其相应的贝叶斯风险，推导出递归差分方程。建立了半区间数无限增长时贝叶斯风险的极限值的存在，并推导出求它的偏微分方程。其次，与之前考虑的这个问题的设置不同，我们将策略分析为受控过程的当前历史的函数，而不是后验分布的演变。这消除了在先前设置中强加的允许参数集的有限性要求。仿真表明，为了在实践中找到贝叶斯和极大极小策略和风险，将到达的收益分成 30 个批次就足够了。在极小极大设置的情况下，如果控制范围无限增长，则表明对到达的收入进行优化处理并不比最佳批处理更有效。将到达的收入分成 30 批就足够了。在极小极大设置的情况下，如果控制范围无限增长，则表明对到达的收入进行优化处理并不比最佳批处理更有效。将到达的收入分成 30 批就足够了。在极小极大设置的情况下，如果控制范围无限增长，则表明对到达的收入进行优化处理并不比最佳批处理更有效。

更新日期：2022-07-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>