Skip to main content
Log in

Simplicity science

  • Review paper
  • Published:
Indian Journal of Physics Aims and scope Submit manuscript

Abstract

In recent years, massive amount of data have been made available on a variety of systems, from biology and neuroscience to economics and social sciences. This, and increasing computational power, has led to a surge of approaches based on large computational models, which are particularly suited in the absence of knowledge on the underlying “laws of motion” of such complex systems. I will argue that approaches aimed at extracting simple models or principles from complex systems or from large datasets are still possible. These rely on advances in our understanding of collective phenomena that provide a wealth of powerful methods to distill simple models from complex phenomena. Furthermore, information theory, considered as a universal language for describing complex systems, provides simple principles that can be used both in modelling and in inference from large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. "It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience". From A. Einstein, “On the Method of Theoretical Physics”, lecture delivered at Oxford, 10 June 1933.

  2. Large machine learning models also defy the classical variance-bias trade-off of statistics that maintains that the number of parameters of models should be bounded by the size of the dataset in order to avoid overfitting. Indeed models like deep neural networks operate in an unconventional statistical regime where models interpolate exactly the data points and generalisation improves as the number of parameters increases [39, 63]. Even in the prototypical example of restricted Boltzmann machines trained on the MNIST dataset, the number of parameters can easily be way more than the number of images used in training.

  3. When applicable, Landau’s approach of deriving models from the power expansion of the relevant variables and their derivatives, retaining only terms which are consistent with the symmetries and conservation laws, is a very powerful one. For an application to the dynamics of financial correlations, see [36].

  4. Let me illustrate this point with the example of the 2008 financial crisis. For no apparent reason, on 9 August 2007, the liquidity in the interbank market suddenly dried up, with short-term interest rates which jumped to historical highs. This was the culmination of a series of events, as detailed in Ref. [12], that later unfolded into a full blow global financial crisis. Morris and Shin [44] highlighted the mechanism by which a single bank could get in trouble because its investors loose confidence. Scaling up these insights from the individual bank to the whole banking system, we have shown [2] that the sudden evaporation of trust among market participants that occurred on 9 August 2007 can be reproduced as a first-order phase transition. This sheds light on the determinants of the crisis—loan’s maturity mismatch, counterparty risks and transparency—as well as on recovery policies.

  5. This seems to be the case also in physics, at the time of writing.

  6. Zipf’s law states that the \(k^\textrm{th}\) most frequent item in a population should occur with a frequency which is one \(k^\textrm{th}\) of that of the most frequent one.

  7. For example, we describe how a population is distributed in space in terms of cities. Yet a more precise way of locating where people live would rely on their ZIP code. It is unclear whether the population distribution by ZIP code would also display a power-law distribution. Likewise, Zipf’s law applies to frequency of words in language, but not when considering words with the same functional role [1].

  8. The environment in physics—the so-called "heat bath"—does not need to be modelled, and it intervenes only through parameters, such as temperature, pressure and chemical potential, that govern exchanges of the system with it.

  9. A familiar example is that of the Ising model where \(\textbf{s}=(s_1,\ldots ,s_n)\) is a configuration of spin variables \(s_i=\pm 1\), and the observables \(\phi ^\mu (\textbf{s})\) are single-spin variables \(s_i\) or products \(s_i s_j\). There is a rich literature [48] of using this model for describing systems such as neurons [57] or the US Supreme Court [33].

  10. In statistics, \(\phi ^\mu (\textbf{s})\) is also called the sufficient statistics because their values are sufficient to estimate the parameters \(g^\mu\).

  11. Beretta et al. [8] have shown that, at least for spin variables, pairwise models are among the most complex ones, in terms of their minimum description length complexity.

  12. Duranthon et al. [19] have shown that a Gaussian neural network is unable to develop maximally informative internal representations, such as those that emerge for example in the internal layer of restricted Boltzmann machines.

  13. Sampling from these models is also simple, and it does not require to run Monte Carlo algorithms, as for pairwise Ising models, for example.

References

  1. L Aitchison, N Corradi and P E Latham PLoS Comput. Biol. 12 e1005110 (2016)

  2. K Anand, P Gai and M Marsili J. Econ. Dyn. Control 36 1088 (2012)

  3. C Anderson Wired Mag. 16 16 (2008)

  4. Robert L Axtell and J Doyne Farmer J. Econ. Lit. (2022)

  5. S Azaele, S Suweis, J Grilli, I Volkov, J R Banavar and A Maritan Rev. Mod. Phys. 88 035003 (2016)

  6. A Baker Simplicity. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Summer 2022 edition (2022)

  7. F Battiston, E Amico, A Barrat, G Bianconi, G F de Arruda, B Franceschiello, I Iacopini, S Kéfi, V Latora, Y Moreno, et al. Nat. Phys. 17 1093 (2021)

  8. A Beretta, C Battistin, C De Mulatier, I Mastromatteo and M Marsili Entropy 20 10 739 (2018)

  9. J Bergelson, M Kreitman, D A Petrov, A Sanchez and M Tikhonov.Elife 10 e67646 (2021)

  10. W Bialek Biophysics: searching for principles. Princeton University Press (2012)

  11. G Bianconi and M Marsili Europhys. Lett. 74 740 (2006)

  12. C E V Borio. The financial turmoil of 2007-?: a preliminary assessment and some policy considerations (2008)

  13. J-P Bouchaud and M Potters Theory of financial risk and derivative pricing: from statistical physics to risk management. Cambridge university press (2003)

  14. J D Burgos and P Moreno-Tovar Biosystems 39 227 (1996)

  15. P Charbonneau, E Marinari, G Parisi, F Ricci-tersenghi, G Sicuro, F Zamponi and M Mezard. Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years. World Scientific (2023)

  16. R J Cubero, J Jo, M Marsili, Y Roudi and J Song. J. Stat. Mech.: Theory Exp. 2019 063402 (2019)

  17. Ryan John Cubero, Matteo Marsili, and Yasser Roudi. Multiscale relevance and informative encoding in neuronal spike trains. Journal of computational neuroscience, 48(1):85, 2020

  18. C de Mulatier, P P Mazza and M Marsili Statistical inference of minimally complex models. arXiv preprintarXiv:2008.00520 (2020)

  19. O Duranthon, M Marsili and R Xie J. Stat. Mech.: Theory Exp. 2021 033409 (2021)

  20. D K Foley J. Econ. Theory 62 321 (1994)

  21. X Gabaix Q. J. Econ. 114 739 (1999)

  22. S Grigolon, S Franz and M Marsili Mol. BioSyst. 12 2147 (2016)

  23. P D Grünwald and A Grunwald The minimum description length principle. MIT press (2007)

  24. J Hidalgo, J Grilli, S Suweis, M A Muñoz, J R Banavar and A Maritan Proc. Nat. Acad. Sci. 111 10095 (2014)

  25. J J Hopfield Proc. Nat. Acad. Sci. 79 2554 (1982)

  26. J M Horowitz and T R Gingrich Nat. Phys. 16 15 (2020)

  27. S Hui et al. Mol. Syst. Biol. 11 784 (2015)

  28. M A Huynen and E Van Nimwegen Mol. Biol. Evol. 15 583 (1998)

  29. E T Jaynes Phys. Rev. 106 620 (1957)

  30. L P Kadanoff Turbulent heat flow: structures and scaling Phys. Today 54 34 (2001)

  31. A Kirman. Complex economics: individual and collective rationality. Routledge (2010)

  32. S Lakhal, A Darmon, I Mastromatteo, M Marsili and Michael Benzaquen. Multiscale relevance of natural images. arXiv preprintarXiv:2303.12717 (2023)

  33. E D Lee, C P Broedersz and W Bialek Statistical mechanics of the us supreme court J. Stat. Phys. 160 275 (2015)

  34. M Marsili, I Mastromatteo and Y Roudi J. Stat. Mech: Theory Exp. 2013 P09003 (2013)

  35. M Marsili The Eur. Phys. J. B 55 169 (2007)

  36. M Marsili, G Raffaelli and B Ponsot J. Econ. Dyn. Control 33 1170 (2009)

  37. M Marsili and Y Roudi Phys. Rep. 963 1 (2022)

  38. A Mazzolini, M Gherardi, M Caselle, M C Lagomarsino and M Osella Phys. Rev. X 8 021023 (2018)

  39. S Mei and A Montanari Commun. Pure Appl. Math. 75 667 (2022)

  40. M Mézard, G Parisi and M A Virasoro Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company (1987)

  41. R Monasson and R Zecchina Phys. Rev. E 56 1357 (1997)

  42. T Mora and W Bialek J. Stat. Phys. 144 268 (2011)

  43. F Morcos et al. Proc. Nat. Acad. Sci. 108 E1293 (2011)

  44. S Morris and H S Shin. Global games: Theory and applications (2001)

  45. M A Munoz Rev. Mod. Phys. 90 031001 (2018)

  46. I J Myung, V Balasubramanian and M A Pitt Proc. Nat. Acad. Sci. 97 11170 (2000)

  47. M E J Newman Contemp. Phys. 46 323 (2005)

  48. H Chau Nguyen, R Zecchina and J Berg Adv. Phys. 66 197 (2017)

  49. J Nguyen, S T Powers, N Urquhart, T Farrenkopf and M Guckert Transp. Res. Indiscip. Perspect. 12 100486 (2021)

  50. B A Olshausen and D J Field Curr. Opin. Neurobiol. 14 481 (2004)

  51. OpenAI. Gpt-4 technical report, (2023)

  52. P A Ortega and D A Braun Proc. R. Soc. A: Math. Phys. Eng. Sci. 469 20120683 (2013)

  53. A Roli, M Villani, A Filisetti and R Serra J. Syst. Sci. Complexity 31 647 (2018)

  54. H A Simon Biometrika 42 425 (1955)

  55. D Sornette Phys. Rev. E 57 4811 (1998)

  56. J M Thornton, R A Laskowski and N Borkakoti Nat. Med. 27 1666 (2021)

  57. G Tkačik, T Mora, O Marre, D Amodei, S E Palmer, M J Berry and W Bialek Proc. Nat. Acad. Sci. 112 37 11508 (2015)

  58. G Tkačik and W Bialek Annu. Rev. Condens. Matter Phys. 7 89 (2016)

  59. E Wigner Commun. Pure Appl. Math., 13 (1960)

  60. D H Wolpert. What is important about the no free lunch theorems? In Black box optimization, machine learning, and no-free lunch theorems, pages 373–388. Springer, (2021)

  61. D H Wolpert and W G Macready IEEE Trans. Evol. Comput. 1 67 (1997)

  62. R Xie and M Marsili A simple probabilistic neural network for machine understanding (2023)

  63. C Zhang, S Bengio, M Hardt, B Recht and O Vinyals Commun. ACM 64 107 (2021)

  64. G K Zipf. Selected studies of the principle of relative frequency in language. Harvard university press (1932)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Marsili.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marsili, M. Simplicity science. Indian J Phys (2024). https://doi.org/10.1007/s12648-024-03068-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12648-024-03068-9

Navigation