当前位置: X-MOL 学术Eur. J. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Does phasic dopamine release cause policy updates?
European Journal of Neuroscience ( IF 3.698 ) Pub Date : 2023-12-01 , DOI: 10.1111/ejn.16199
Francis Carter 1, 2 , Marie‐Pierre Cossette 1 , Ivan Trujillo‐Pisanty 1, 3 , Vasilios Pallikaras 1 , Yannick‐André Breton 1 , Kent Conover 1 , Jill Caplan 1 , Pavel Solis 1 , Jacques Voisard 1 , Alexandra Yaksich 1 , Peter Shizgal 1
Affiliation  

Phasic dopamine activity is believed to both encode reward-prediction errors (RPEs) and to cause the adaptations that these errors engender. If so, a rat working for optogenetic stimulation of dopamine neurons will repeatedly update its policy and/or action values, thus iteratively increasing its work rate. Here, we challenge this view by demonstrating stable, non-maximal work rates in the face of repeated optogenetic stimulation of midbrain dopamine neurons. Furthermore, we show that rats learn to discriminate between world states distinguished only by their history of dopamine activation. Comparison of these results to reinforcement learning simulations suggests that the induced dopamine transients acted more as rewards than RPEs. However, pursuit of dopaminergic stimulation drifted upwards over a time scale of days and weeks, despite its stability within trials. To reconcile the results with prior findings, we consider multiple roles for dopamine signalling.

中文翻译:

阶段性多巴胺释放是否会导致政策更新?

据信,阶段性多巴胺活动既编码奖励预测错误(RPE),又导致这些错误产生的适应。如果是这样,用于多巴胺神经元光遗传学刺激的老鼠将反复更新其策略和/或行动值,从而迭代地提高其工作效率。在这里,我们通过证明面对中脑多巴胺神经元的重复光遗传学刺激时稳定的非最大工作率来挑战这一观点。此外,我们还发现,老鼠学会了区分仅通过多巴胺激活历史来区分的世界状态。将这些结果与强化学习模拟进行比较表明,诱导的多巴胺瞬变比 RPE 更能起到奖励作用。然而,尽管多巴胺能刺激在试验中保持稳定,但在数天和数周的时间范围内,对多巴胺能刺激的追求有所上升。为了使结果与先前的发现相一致,我们考虑了多巴胺信号传导的多重作用。
更新日期:2023-12-01
down
wechat
bug