A physics-inspired approach to the understanding of molecular representations and models

Abstract

The story of machine learning in general, and its application to molecular design in particular, has been a tale of evolving representations of data. Understanding the implications of the use of a particular representation – including the existence of so-called ‘activity cliffs’ for cheminformatics models – is the key to their successful use for molecular discovery. In this work we present a physics-inspired methodology which exploits analogies between model response surfaces and energy landscapes to richly describe the relationship between the representation and the model. From these similarities, a metric emerges which is analogous to the commonly used frustration metric from the chemical physics community. This new property shows state-of-the-art prediction of model error, whilst belonging to a novel class of roughness measure that extends beyond the known data allowing the trivial identification of activity cliffs even in the absence of related training or evaluation data.

Graphical abstract: A physics-inspired approach to the understanding of molecular representations and models

Article information

Article type
Paper
Submitted
07 Dec 2023
Accepted
22 Feb 2024
First published
01 Mar 2024
This article is Open Access
Creative Commons BY license

Mol. Syst. Des. Eng., 2024, Advance Article

A physics-inspired approach to the understanding of molecular representations and models

L. Dicks, D. E. Graff, K. E. Jordan, C. W. Coley and E. O. Pyzer-Knapp, Mol. Syst. Des. Eng., 2024, Advance Article , DOI: 10.1039/D3ME00189J

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements