当前位置: X-MOL 学术Phys. Rev. E › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Meta predictive learning model of languages in neural circuits
Physical Review E ( IF 2.4 ) Pub Date : 2024-04-12 , DOI: 10.1103/physreve.109.044309
Chan Li , Junbin Qiu , Haiping Huang

Large language models based on self-attention mechanisms have achieved astonishing performances, not only in natural language itself, but also in a variety of tasks of different nature. However, regarding processing language, our human brain may not operate using the same principle. Then, a debate is established on the connection between brain computation and artificial self-supervision adopted in large language models. One of most influential hypotheses in brain computation is the predictive coding framework, which proposes to minimize the prediction error by local learning. However, the role of predictive coding and the associated credit assignment in language processing remains unknown. Here, we propose a mean-field learning model within the predictive coding framework, assuming that the synaptic weight of each connection follows a spike and slab distribution, and only the distribution, rather than specific weights, is trained. This meta predictive learning is successfully validated on classifying handwritten digits where pixels are input to the network in sequence, and moreover, on the toy and real language corpus. Our model reveals that most of the connections become deterministic after learning, while the output connections have a higher level of variability. The performance of the resulting network ensemble changes continuously with data load, further improving with more training data, in analogy with the emergent behavior of large language models. Therefore, our model provides a starting point to investigate the connection among brain computation, next-token prediction, and general intelligence.

中文翻译:

神经回路中语言的元预测学习模型

基于自注意力机制的大型语言模型不仅在自然语言本身,而且在各种不同性质的任务中都取得了惊人的表现。然而,在处理语言方面,我们的人脑可能不会按照相同的原理进行操作。然后,关于大脑计算与大型语言模型中采用的人工自我监督之间的联系展开了争论。大脑计算中最有影响力的假设之一是预测编码框架,它提出通过局部学习来最小化预测误差。然而,预测编码和相关的学分分配在语言处理中的作用仍然未知。在这里,我们在预测编码框架内提出了一种平均场学习模型,假设每个连接的突触权重遵循尖峰和平板分布,并且仅训练分布而不是特定权重。这种元预测学习在对像素按顺序输入网络的手写数字进行分类以及玩具和真实语言语料库上得到了成功验证。我们的模型表明,大多数连接在学习后变得确定性,而输出连接具有更高水平的可变性。由此产生的网络集成的性能随着数据负载的变化而不断变化,并随着更多的训练数据而进一步提高,类似于大型语言模型的涌现行为。因此,我们的模型为研究大脑计算、下一个标记预测和通用智能之间的联系提供了一个起点。
更新日期:2024-04-12
down
wechat
bug