Abstract
Distributed vector representations are a key bridging point between connectionist and symbolic representations in cognition. It is unclear how uncertainty should be modelled in systems using such representations. In this paper we discuss how bundles of symbols in certain Vector Symbolic Architectures (VSAs) can be understood as defining an object that has a relationship to a probability distribution, and how statements in VSAs can be understood as being analogous to probabilistic statements. The aim of this paper is to show how (spiking) neural implementations of VSAs can be used to implement probabilistic operations that are useful in building cognitive models. We show how similarity operators between continuous values represented as Spatial Semantic Pointers (SSPs), an example of a technique known as fractional binding, induces a quasi-kernel function that can be used in density estimation. Further, we sketch novel designs for networks that compute entropy and mutual information of VSA-represented distributions and demonstrate their performance when implemented as networks of spiking neurons. We also discuss the relationship between our technique and quantum probability, another technique proposed for modelling uncertainty in cognition. While we restrict ourselves to operators proposed for Holographic Reduced Representations, and for representing real-valued data. We suggest that the methods presented in this paper should translate to any VSA where the dot product between fractionally bound symbols induces a valid kernel.
Similar content being viewed by others
Data availability
The data used to generate the figures in this paper is available as jupyter notebooks www.gitlab.com/furlong/vsa-prob.
Code availability
The code used to generate the figures in this paper is available as jupyter notebooks www.gitlab.com/furlong/vsa-prob. Those notebooks additionally depend on code available at www.github.com/ctn-waterloo/ssp-bayesopt.
Notes
(1) The probability any event is non-negative. (2) The probability of all events is 1. (3) The probability of a set of mutually exclusive events is the sum of their individual probabilities.
Depending on the desired kernel, there are more accurate encodings, see Sutherland and Schneider (2015).
The SPA admits other binding operators, e.g. the Vector-derived transformation binding of Gosmann and Eliasmith (2019).
In this paper we only denote isotropic kernel approximations, but it is possible to have different length scales, h, for the different dimensions of \(\textbf{x}\). For all examples modelling a 2D Gaussian Mixture Model we fit a length scale for each dimension in the domain of the distribtion.
The sinc function is not a common choice for a kernel, but it can be demonstrated to be better,in the sense of mean integrated square error, than the Epanechnikov kernel, which is commonly considered to be the “optimal” kernel (Tsybakov 2009, §1.3).
In this work, the activity of a ReLU neuron is given by \(a(t) = a_\text{max}\text{ReLU}(W\cdot x(t) + b)\), where \(a_\text{max} > 0\) is the maximum firing rate of the neuron. To recover probability values, we normalize all computed firing rates by \(a_\text{max}\), however, we elide that scaling from our notation.
References
Agarwal R, Chen Z, Sarma SV (2016) A novel nonparametric maximum likelihood estimator for probability density functions. IEEE Trans Pattern Anal Mach Intelligence 39(7):1294–1308
Anastasio TJ, Patton PE, Belkacem-Boussaid K (2000) Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Comput 12(5):1165–1187
Anderson CH, Van Essen DC (1994) Neurobiological computational systems. In: Computational intelligence imitating life 213222
Arimoto S (1977) Information measures and capacity of order \(\alpha \) for discrete memoryless channels. Topics in information theory
Arora A, Furlong PM, Fitch R et al (2019) Multi-modal active perception for information gathering in science missions. Auton Robot 43(7):1827–1853
Barber MJ, Clark JW, Anderson CH (2003) Neural representation of probabilistic information. Neural Comput 15(8):1843–1864
Bekolay T, Bergstra J, Hunsberger E et al (2014) Nengo: a python tool for building large-scale functional brain models. Front Neuroinform 7:48
Boerlin M, Denève S (2011) Spike-based population coding and working memory. PLoS Comput Biol 7(2):e1001080
Bogacz R (2015) Optimal decision making in the cortico-basal-ganglia circuit. In: An introduction to model-based cognitive neuroscience. Springer, pp 291–302
Bogacz R, Gurney K (2007) The basal ganglia and cortex implement optimal decision making between alternative actions. Neural Comput 19(2):442–477
Bogacz R, Larsen T (2011) Integration of reinforcement learning and optimal decision-making theories of the basal ganglia. Neural Comput 23(4):817–851
Born M (1926) Quantenmechanik der stoßvorgänge. Z Phys 38(11):803–827
Buesing L, Bill J, Nessler B et al (2011) Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput Biol 7(11):e1002211
Busemeyer JR, Bruza PD (2012) Quantum models of cognition and decision. Cambridge University Press
Busemeyer JR, Wang Z, Shiffrin RM (2015) Bayesian model comparison favors quantum over standard decision theory account of dynamic inconsistency. Decision 2(1):1
Chacón J, Montanero J, Nogales A et al (2007) On the existence and limit behavior of the optimal bandwidth for kernel density estimation. Stat Sin 17(1):289–300
Chater N, Oaksford M (2008) The probabilistic mind: prospects for Bayesian cognitive science. Oxford University Press, USA
Choo X, Eliasmith C (2010) A spiking neuron model of serial-order recall. In: Cattrambone R, Ohlsson S (eds) 32nd Annual conference of the cognitive science society. Cognitive Science Society, Portland, OR
Darlington TR, Beck JM, Lisberger SG (2018) Neural implementation of Bayesian inference in a sensorimotor behavior. Nat Neurosci 21(10):1442–1451
Davis KB (1975) Mean square error properties of density estimates. Ann Stat 3:1025–1030
Davis KB (1977) Mean integrated square error properties of density estimates. Ann Stat 5:530–535
Deneve S (2008) Bayesian spiking neurons I: inference. Neural Comput 20(1):91–117
Doya K (2021) Canonical cortical circuits and the duality of Bayesian inference and optimal control. Curr Opin Behav Sci 41:160–167
Doya K, Ishii S, Pouget A et al (2007) Bayesian brain: probabilistic approaches to neural coding. MIT Press
Dumont N, Eliasmith C (2020) Accurate representation for spatial cognition using grid cells. In: CogSci
Echeveste R, Aitchison L, Hennequin G et al (2020) Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nat Neurosci 23(9):1138–1149
Eliasmith C (2013) How to build a brain: a neural architecture for biological cognition. Oxford University Press
Eliasmith C, Anderson CH (2003) Neural engineering: computation, representation, and dynamics in neurobiological systems. MIT Press, Berlin
Eliasmith C, Stewart TC, Choo X et al (2012) A large-scale model of the functioning brain. Science 338(6111):1202–1205
Elliott L, Eliasmith C (2009) MCMC with spiking neurons. In: NIPS workshop on Bayesian inference in the brain
Faisal AA, Selen LP, Wolpert DM (2008) Noise in the nervous system. Nat Rev Neurosci 9(4):292–303
Fehr S, Berens S (2014) On the conditional Rényi entropy. IEEE Trans Inf Theory 60(11):6801–6810
Frady EP, Kleyko D, Kymn CJ, et al (2021) Computing on functions using randomized vector representations. arXiv preprint arXiv:2109.03429
Furlong PM, Eliasmith C (2022) Fractional binding in vector symbolic architectures as quasi-probability statements. In: Proceedings of the annual meeting of the cognitive science society
Gayler RW (2004) Vector symbolic architectures answer Jackendoff’s challenges for cognitive neuroscience. arXiv preprint cs/0412059
Glad IK, Hjort NL, Ushakov NG (2003) Correction of density estimators that are not densities. Scand J Stat 30(2):415–427
Glad IK, Hjort NL, Ushakov N (2007) Density estimation using the sinc kernel. Preprint Statistics, vol 2, p 2007
Goodman ND, Tenenbaum JB, Contributors TP (2016) Probabilistic models of cognition. http://probmods.org/v2. Accessed 23 Jan 2023
Gosmann J (2015) Precise multiplications with the NEF. Tech. rep, Centre for Theoretical Neuroscience, Waterloo, ON
Gosmann J, Eliasmith C (2019) Vector-derived transformation binding: an improved binding operation for deep symbol-like processing in neural networks. Neural Comput 31(5):849–869. https://doi.org/10.1162/neco_a_01179
Hou H, Zheng Q, Zhao Y et al (2019) Neural correlates of optimal multisensory decision making under time-varying reliabilities with an invariant linear probabilistic population code. Neuron 104(5):1010–1021
Hoyer P, Hyvärinen A (2002) Interpreting neural response variability as Monte Carlo sampling of the posterior. In: Advances in neural information processing systems, vol 15
Huang Y, Rao RP (2014) Neurons as Monte Carlo samplers: Bayesian inference and learning in spiking networks. In: Advances in neural information processing systems, vol 27
Joshi A, Halseth JT, Kanerva P (2017) Language geometry using random indexing. In: Quantum interaction: 10th international conference, QI 2016, San Francisco, CA, USA, July 20–22, 2016, Revised Selected Papers 10. Springer, pp 265–274
Kanerva P (1996) Binary spatter-coding of ordered k-tuples. In: Artificial neural networks-ICANN 96: 1996 international conference Bochum, Germany, July 16–19, 1996 Proceedings 6. Springer, pp 869–873
Kanerva P (2009) Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognit Comput 1:139–159
Kappel D, Habenschuss S, Legenstein R et al (2015a) Network plasticity as Bayesian inference. PLoS Comput Biol 11(11):e1004485
Kappel D, Habenschuss S, Legenstein R et al (2015b) Synaptic sampling: a Bayesian approach to neural network plasticity and rewiring. Adv Neural Inf Process Syst 28:370–378
Kleyko D, Davies M, Frady EP, et al (2021) Vector symbolic architectures as a computing framework for nanoscale hardware. arXiv preprint arXiv:2106.05268
Kleyko D, Davies M, Frady EP et al (2022) Vector symbolic architectures as a computing framework for emerging hardware. Proc IEEE 110(10):1538–1571
Komer B (2020) Biologically inspired spatial representation. PhD thesis, University of Waterloo
Korcsak-Gorzo A, Müller MG, Baumbach A et al (2022) Cortical oscillations support sampling-based computations in spiking neural networks. PLoS Comput Biol 18(3):e1009753
Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in Gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9(2):235
Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338
Levy SD, Gayler R (2008) Vector symbolic architectures: a new building material for artificial general intelligence. In: Proceedings of the 2008 conference on artificial general intelligence 2008: proceedings of the first AGI conference, pp 414–418
Loredo T (2003) Bayesian adaptive exploration in a nutshell. Stat Probl Particle Phys Astrophys Cosmol 1:162
Ma WJ, Beck JM, Latham PE et al (2006) Bayesian inference with probabilistic population codes. Nat Neurosci 9(11):1432–1438
Ma WJ, Beck JM, Pouget A (2008) Spiking networks for Bayesian inference and choice. Curr Opin Neurobiol 18(2):217–222
Mainen ZF, Sejnowski TJ (1995) Reliability of spike timing in neocortical neurons. Science 268(5216):1503–1506
Masset P, Zavatone-Veth J, Connor JP et al (2022) Natural gradient enables fast sampling in spiking neural networks. Adv Neural Inf Process Syst 35:22018–22034
Mundy A (2017) Real time Spaun on SpiNNaker functional brain simulation on a massively-parallel computer architecture. The University of Manchester (United Kingdom)
Mutnỳ M, Krause A (2019) Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. Adv Neural Inf Process Syst 31:9005–9016
Neubert P, Schubert S, Protzel P (2019) An introduction to hyperdimensional computing for robotics. KI-Künstl Intell 33(4):319–330
Plate TA (1992) Holographic recurrent networks. In: Advances in neural information processing systems, vol 5
Plate TA (1994) Distributed representations and nested compositional structure. University of Toronto, Department of Computer Science
Plate TA (1995) Holographic reduced representations. IEEE Trans Neural Netw 6(3):623–641
Plate TA (2003) Holographic reduced representation: distributed representation for cognitive structures. CSLI Publications, Stanford
Pothos EM, Busemeyer JR (2013) Can quantum probability provide a new direction for cognitive modeling? Behav Brain Sci 36(3):255–274
Pothos EM, Busemeyer JR (2022) Quantum cognition. Annu Rev Psychol 73:749–778
Pouget A, Dayan P, Zemel RS (2003) Inference and computation with population codes. Annu Rev Neurosci 26(1):381–410
Rahimi A, Recht B, et al (2007) Random features for large-scale kernel machines. In: NIPS, Citeseer, p 5
Rao RP (2004) Bayesian computation in recurrent neural circuits. Neural Comput 16(1):1–38
Rényi A, et al (1961) On measures of entropy and information. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, Berkeley, California, USA
Rosenblatt M (1969) Conditional probability density and regression estimators. Multivar Anal II 25:31
Rule JS, Piantadosi S, Tenenbaum J (2022) Learning as programming: modeling efficient search in human concept learning. In: Proceedings of the annual meeting of the cognitive science society
Salinas E, Abbott L (1994) Vector reconstruction from firing rates. J Comput Neurosci 1(1–2):89–107
Sanborn AN, Chater N (2016) Bayesian brains without probabilities. Trends Cognit Sci 20(12):883–893
Savin C, Denève S (2014) Spatio-temporal representations of uncertainty in spiking neural networks. In: Advances in neural information processing systems, vol 27
Schlegel K, Neubert P, Protzel P (2020) A comparison of vector symbolic architectures. arXiv preprint arXiv:2001.11797
Schneider M (2017) Expected similarity estimation for large-scale anomaly detection. PhD thesis, Universität Ulm
Schneider M, Ertel W, Ramos F (2016) Expected similarity estimation for large-scale batch and streaming anomaly detection. Mach Learn 105(3):305–333
Sharma S (2018) Neural plausibility of Bayesian inference. Master’s thesis, University of Waterloo
Sharma S, Voelker A, Eliasmith C (2017) A spiking neural Bayesian model of life span inference. In: CogSci
Smolensky P (1990) Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif Intell 46(1–2):159–216
Stewart TC, Eliasmith C (2013) Realistic neurons can compute the operations needed by quantum probability theory and other vector symbolic architectures. Behav Brain Sci 36(3):307
Stewart TC, Choo X, Eliasmith C, et al (2010) Dynamic behaviour of a spiking model of action selection in the basal ganglia. In: Proceedings of the 10th international conference on cognitive modeling, Citeseer, pp 235–40
Sutherland DJ, Schneider J (2015) On the error of random Fourier features. arXiv preprint arXiv:1506.02785
Tsybakov AB (2009) Introduction to nonparametric estimation. Springer
Voelker AR (2020) A short letter on the dot product between rotated Fourier transforms. arXiv preprint arXiv:2007.13462
Voelker AR, Blouw P, Choo X et al (2021) Simulating and predicting dynamical systems with spatial semantic pointers. Neural Comput 33(8):2033–2067
Walker EY, Cotton RJ, Ma WJ et al (2020) A neural basis of probabilistic computation in visual cortex. Nat Neurosci 23(1):122–129
Wand MP, Jones M (1995) Kernel smoothing. In: Monographs on statistics and applied probability; 060, 1st edn., Chapman & Hall, London
Xu K, Srivastava A, Gutfreund D et al (2021) A Bayesian-symbolic approach to reasoning and learning in intuitive physics. Adv Neural Inf Process Syst 34:2478–2490
Zemel R, Dayan P, Pouget A (1996) Probabilistic interpretation of population codes. In: Advances in neural information processing systems, vol 9
Acknowledgements
The authors would like to thank Nicole Sandra-Yaffa Dumont, Drs. Jeff Orchard, Bryan Tripp, and Terry Stewart for discussions that helped improve this paper. An early version of this work appeared in Furlong and Eliasmith (2022). This work was supported by CFI and OIT infrastructure funding as well as the Canada Research Chairs program, NSERC Discovery grant 261453, NUCC NRC File A-0028850, AFOSR grant FA9550-17-1-0026, and an Intel Neuromorphic Research Community Grant.
Funding
This work was supported by CFI and OIT infrastructure funding as well as the Canada Research Chairs program, NSERC Discovery Grant 261453, NUCC NRC File A-0028850, and AFOSR Grant FA9550-17-1-0026, and an Intel Neuromorphic Research Community Grant.
Author information
Authors and Affiliations
Contributions
PMF conceived and designed the initial study. CE and PMF discussed and updated the design. Material preparation, data collection and analysis were performed by PMF. The first draft of the manuscript was written by PMF with extensive revision and contribution from CE. CE supervised and administered the project. PMF and CE acquired funding for the project. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Chris Eliasmith has a financial interest in Applied Brain Research, Incorporated, holder of patents related to the material in this paper (patent 62/820,089). P. Michael Furlong has performed consulting services for Applied Brain Research. The company or this cooperation did not affect the authenticity and objectivity of the experimental results of this work. The funders had no role in the direction of this research; in the analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1 Model complexity analysis
Appendix 1 Model complexity analysis
We presented an algebraic interpretation of VSA operations and the results for spiking neural implementations of these algorithms. Next, we present an analysis of the complexity of these networks. We frame in terms of the number of synaptic operations, which would be simple additions in a spiking neural network, or a multiply and accumulate operation in the case of implementation in a graphics processor or general-purpose CPUs. Because we implemented these networks using spiking rectified linear neural networks, we do not account for the complexity of neural dynamics in this analysis. To estimate the complexity, we require the quantities laid out in Table 1. The summary of the analyses is denoted in Big-O notation in Table 2.
To produce probability estimates using single neurons we have an input of dimension d, and a neural population size of 1, meaning that estimating the probability of a single observation is d synaptic operations. To estimate a probability density using a population of neurons that represent \(n_\text{sampling}\) sampled points for each input dimension, the number of synaptic operations is \(d(n_\text{sampling})^{m}\).
Pre-rectification marginalization requires one linear mapping in SSP space, which is a \(d\times d\) operation. If the SSP is represented as a population of neurons, then we require mapping from the neural population to the SSP latent space, which requires \(n_\text{dim}\) synaptic operations for each SSP dimension, d. Hence, the pre-marginal rectification requires \(d^{2} +d n_\text{dim}\) synaptic operations. Computation of the marginalizing matrix can be computed off-line for each dimension and is not included in the analysis.
Post-rectification marginalization requires first computing a sampled probability distribution, which is \(d(n_\text{sampling})^{m}\), then there must follow the summation over the marginalized dimensions, \(m_\text{marg}\). This requires, for each point in the unmarginalized dimensions, \((n_\text{sampling})^{m-m_\mathrm{marg}}\), computing \((n_\text{sampling})^{m_\mathrm{marg}}\) sums. This results in a synaptic complexity of \(d(n_\text{sampling})^{m} + (n_\text{sampling})^{m}\).
Conditioning requires computing the binding operator from the HRR VSA, which is circular convolution. In this work we use the default Nengo implementation of binding between two d -dimensional vectors, a and b, which produces a new d-dimensional vector, \(c = \textbf{a}\circledast \textbf{b}\). The circular convolution is implemented by a series of rotated dot products, defined:
The multiplication of individual vector elements, \(a_{i}b_{((i-j) \text{mod}\,\, d)}\), is computed using a product network (Gosmann 2015), which requires \(3n_\text{prod}\) synaptic operations. Computing the entire circular convolution requires computing d products for all d elements of the vectors, resulting in a complexity of \(d^{2}\times 3n_\text{prod}\) synaptic operations.
Computing entropy as described in this paper again requires first constructing a sampling of the distribution, which is \(O(d (n_\text{sampling})^{m})\), followed by computing \(-p\log p\) for every neuron in the distribution, which we implement in a single hidden layer neural network, which contains \(n_\text{log}\) neurons. This requires \((n_\text{sampling})^{m}\times 2n_\text{log}\) synaptic operations. This is followed by a population of \(n_\text{ent}\) neurons to represent the sum \(\sum _{i} -p_{i}\log p_{i}\). Consequently, the entropy calculation requires \(2n_\text{ent}(n_\text{sampling})^{m}\) synaptic operations. The function \(p\log p\) can be difficult to compute, and requires substantial neural resources, so we assume in the worst case that \(n_\text{ent} = n_\text{log}\). The total cost to compute the entropy of a distribution is \(4n_\text{log}(n_\text{sampling})^{m} + O(d (n_\text{sampling})^{m})\).
To compute mutual information we must first sample the joint probability distribution, which can be over two vector-valued variables. If we have two variables, \(X_1 \in \mathbb {R}^{m_1}\) and \(X_2 \in \mathbb {R}^{m_2}\), we will assume that \(m = max \{m_{1},m_{2}\}\). Then the initial distribution representation requires \(d(n_\text{sampling})^{2m}\) synaptic operations.
The joint distribution is then marginalized twice, assuming post-rectification for increased accuracy. Exploiting the initial distribution representation permits a complexity of \(2(n_\text{sampling})^{2m}\). We then must compute the entropy of the joint and two marginal distributions, which requires \(4n_\text{log}(n_\text{sampling})^{2m}\) and \(8n_\text{log}(n_\text{sampling})^{m}\), respectively. This results in a final complexity of \((d + 4n_\text{log} + 2)(n_\text{sampling})^{2\,m} + 8n_\text{log}(n_\text{sampling})^{m}.\)
Finally, we consider the cost to update an SSP representation of a distribution. To update the distribution we take the neural population whose latent space is representing the distribution using SSPs and project it into the SSP space, this requires \(dn_\text{dim}\) synaptic operations. Then we add it and the new observation together and store them in the neural population, this requires 2d multiplies (\(O(< b^{2})\) for b-bit numbers, depending on the implementation) to compute the running average and \(2dn_\text{dim}\) synaptic operations to update the population representing the distribution. We note that for more biologically plausible implementations the running average may be replaced by a low pass filter, which has constant multiplication terms that can be integrated into synaptic weights directly.
The above analysis looks at the synaptic operations required for computing the probabilistic operations. This is a measure of resource requirement to construct networks, as well as the total volume of computation that must be executed. However, many of these operations can be parallelized, and on the right computing framework the time between an input being presented to a network to compute these operations can be improved significantly.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Furlong, P.M., Eliasmith, C. Modelling neural probabilistic computation using vector symbolic architectures. Cogn Neurodyn (2023). https://doi.org/10.1007/s11571-023-10031-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11571-023-10031-7