Article The following article is Open access

Learning human insight by cooperative AI: Shannon-Neumann measure

Published 9 April 2021 © 2021 The Author(s). Published by IOP Publishing Ltd
, , Citation Edouard Siregar 2021 IOPSciNotes 2 025001 DOI 10.1088/2633-1357/abec9e

2633-1357/2/2/025001

Abstract

A conceptually sound solution to a complex real-world challenge, is built on a solid foundation of key insights, gained by posing 'good' questions, at the 'right' times/places. If the foundation is weak, due to insufficient human insight, the resulting, conceptually flawed solution, can be very costly or impossible to correct downstream. The response to the global 2020 pandemic, by countries using just-in-time supply/production chains and fragmented health-care systems, are striking examples. Here, Artificial intelligence (AI) tools to help human insight, are of significant value. We present a computational measure of insight gains, which a cooperative AI agent can compute, by having a specific internal framework, and by observing how a human behaves. This measure enables a cooperative AI to maximally boost human insight, during an iterated questioning process—a solid foundation for solving complex open-ended challenges. It is an AI-Human insight bridge, built on Shannon entropy and von Neumann utility. Our next paper will addresses how this measure and its associated strategy, reduce a hard cooperative inverse reinforcement learning game, to simple Q-Learning, proven to converge to a near-optimal policy.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Insight, defined as a combination of discrimination, understanding and intuition, is critical in the conceptual phases of solution-building, to open-ended real-world challenges. These are a complex mixture of human, financial and technological needs. Advances in Big data and machine learning (ML) generated insights [1] can be extremely useful, yet will remain profoundly incomplete, as long as the human dimension is not well accounted for.

In the context of business, financial or scientific insight and discovery, ML is extensively used to extract useful patterns, by combining big data with human-generated prior domain knowledge, to enable interpretation and explanation. We approach the question from the complementary perspective of human compatible artificial intelligence (AI) [2] instead of taking humans out of the loop during computations (which risks being an opaque 'black box', and missing the human dimension), we focus on human-centered AI, to assist people in gaining insights, by observing human behavior.

We propose a computational measure μI of insight I, enabling an AI to compute gains in I, resulting from a cooperative questioning process, between a human(H), and an AI cognitive assistant (Acog ).

The framework for the AI's learning, is a 2-person Iterated Questioning (IQ) game. A set Q, of (yes/no) questions qi  ∈ Q, posed to reach a goal G. The set Q is shared by both H and Acog , whose only objective is to maximize the total insight gained by H, over the course of the cooperative game [3]. Each IQ-game is played until H achieves a given long-term objective: to reach a minimum threshold of total insight gained. Reaching this threshold, defines a learning episode for Acog . The game is 'solved', when optimal strategies (the shortest way to reach the threshold) are found for H and Acog .

For Acog to achieve its single-minded role, it needs to learn about human insights. It can learn this, by the way we react during the IQ game. In this cooperative 2-person game: Acog 's decision policy is to suggest a probability-ranked list of questions ${q}_{i}\in Q;(i=1,\ldots ,n)$, it believes (in a Bayesian sense) are most insightful for H, and H's decision policy is to select the most promising questions ${q}_{j}\in Q$ to explore, and then signal how useful qj was.

This will enable Acog to compute μI , simply by observing H. The measure ${\mu }_{I}^{G}(Q)$ combines the (objective) Shannon entropy S, with a user-defined (subjective) von Neumann utility U . We call it the Shannon-Neumann, or SN-measure of insight gains [3, 4] .

The assistant Acog 's only objective, is to help H gain a maximum total amount of insight ${\mu }_{T}^{G}$, over the pursuit of a long-term objective. This approach is consistent with Stuart Russell's thesis [5] for designing human-compatible AI, where the AI assistant cooperatively realigns with H's short and long-term goals.

2. Insight gains in a questioning process

2.1. Measuring insight gains

Gains of insight are normally defined as changes in discernment (precision-increase), understanding (error-correction, uncertainty-reduction), and intuition (familiarity-boost) about a topic T. We want to introduce a reasonable computational measure, of gains in insight resulting from exploring a set of questions. Discernment, understanding, and intuition on a topic T, are normally gained, by posing a set of 'good questions', at the 'right times and places', leading to stories, metaphors, analogies, models and theories of T, with increasing accuracy, precision and relevance. In this paper we address how the AI agent learns what a 'good question' is (one that maximizes the insight gains). In the next paper [6], we will model the notion of the 'right times/places', using Cooperative Inverse Reinforcement Learning [7].

The IQ game is a precision-boosting, uncertainty-reducing process, where we start from an approximate understanding of T, whose precision will increase and error will decrease, under iterated questioning. We formalize the IQ, using the simplest model, which still retains its essence: posing a suite of (yes/no) questions Q, to achieve a goal G, which we describe next.

2.2. Framework for insight gains

We create a framework, which enables us to separate the 'good questions' (more informative and insightful ones), from the 'bad questions' (less informative and insightful ones). The framework can be thought of, as an elaborate form of the 20 (yes/no) question game.

A formal questioning framework F (our toy universe), is used to frame and measure insight gains $\delta {I}^{G}(Q)\equiv {\mu }_{I}^{G}(Q)$ towards achieving a goal G, when answering questions in Q. The measure is built, by combining Shannon information S [8] with von Neumann utility measure U, associated to a set of questions Q: we call it the SN-measure. The elements of Acog 's toy cognitive universe $F(T,G,Q)$ are:

  • A topic T consisting of a bag of N balls, each with only two properties: ${color}(B,W),{size}(L,S)$. The long-term goal, is to fully determine T.
  • A set G of short-term goals ${g}_{i}\in G;i=1,2$ about T
  • A set Q of (yes/no) questions ${q}_{i}\in Q;i=1,\ldots ,4$. Q is the information source (in information theory setting).

The set Q is a discrete information source [9], as the answers to each qi produces a certain amount of information (and insight on achieving G), by reducing the uncertainty or ambiguity, about T.

In this IQ game, our long-term goal, is to fully determine T's composition. The short-term goals ${g}_{i}\in G$, to gain insights on T, are:

  • ${g}_{1}\,=$ 'To know the proportion of balls in T, of each color'
  • ${g}_{2}\,=$ 'To know the proportion of balls in T, of each size'

The insight generating (yes/no) questions qi  ∈ Q on the topic T, could be, for example, the following:

  • ${q}_{1}\,=$ 'are there $N/2$ balls in T, B?'
  • q2 =  'are the balls in T, mostly B?'
  • q3 =  'are the balls in T, mostly L?'
  • ${q}_{4}\,=$ 'did Sam place the balls in T?'

This simple framework, allows us to express the necessary (and arguably, sufficient) properties of a good measure ${\mu }_{I}^{G}(Q)$ of insight gains, in mathematical terms.

2.3. Necessary properties of insight gains measures

We want to measure the insight gained ${\mu }_{I}^{G}(Q)$, by a 'yes' answer to each question qi  ∈ Q in the information source Q, with respect to each short-term goal ${g}_{j}\in G$.

The required properties ${P}_{i}\in P$, of a meaningful insight measure μ are the following:

  • 1.  
    P1: ${\mu }_{I}^{{g}_{1}}({q}_{1})\gt {\mu }_{I}^{{g}_{1}}({q}_{2})\gt 0$
  • 2.  
    P2: ${\mu }_{I}^{{g}_{1}}({q}_{2})\gt 0$ and ${\mu }_{I}^{{g}_{2}}({q}_{3})\gt 0$
  • 3.  
    P3: ${\mu }_{I}^{{g}_{1}}({q}_{3})=0$ and ${\mu }_{I}^{{g}_{2}}({q}_{2})=0$
  • 4.  
    P4: ${\mu }_{I}^{{g}_{1}}({q}_{4})=0$ and ${\mu }_{I}^{{g}_{2}}({q}_{4})=0$

The justifications for these requirements on a measure μ of insight gain, are:

  • P1: ${\mu }_{1}\gt {\mu }_{2}$ when a 'yes' answer to q1 is more informative (reduces more uncertainty) than a 'yes' answer to q2, and both are useful for achieving g1.
  • P2: $\mu \gt 0$ when a 'yes' answer to qi is informative (about T) and useful for gj
  • P3: μ = 0 is when a 'yes' answer to qi is informative (about T), but qi useless for gj
  • P3: μ = 0 when a 'yes' answer to qi does not produce relevant information

The intuitive interpretation is that for a question to be insightful (produce new insight), it must be simultaneously, informative (Shannon) about the topic T, and useful (von Neumann) for achieving the goal G. In other words, we define insightful, as both informative (uncertainty, error and ignorance reducing) about a topic, and actionable (useful) for achieving a goal.

We can now construct the measure ${\mu }_{I}^{G}(Q)$ of insight gains, based on the required properties ${P}_{i}\in P;(i=1,\ldots ,4)$.

3. Shannon-neumann measure ${\mu }_{I}^{G}(Q)$

Given the property requirements $P=\{{P}_{i};(i=1,\ldots ,4)\}$, for the measure of insight gains μ, we can now precisely define the measure ${\mu }_{I}^{G}(Q)$, in terms of Shannon entropy S, and a (human evaluated) von Neumann utility U.

Four our need, we define the Shannon [10] information $S({q}_{i})$ produced by the 'yes' answer to a question qi , as (log base 2):

Equation (1)

and the Shannon entropy S(Q) of an entire information source Q, is the expected amount of information from the source Q:

Equation (2)

Relations (1, 2) are fundamental definitions of entropy, in information theory [11] and previously, in statistical mechanics [12, 13]. In the context of our 2-person cooperative IQ-game, ${p}_{i}=p({q}_{i})$ is the probability that the answer to the (yes/no) question qi about the topic T, is 'yes'.

Note that, the more uncertainty (ignorance) the answer to qi resolves, the lower its probability $p(\ {q}_{i}\ )$, and the more informative it is. The information units are bits, when the log base is 2. So a good estimator for the probabilities pi , is the amount of uncertainty a 'yes' answer eliminates (just as 'good' questions, in the 20-question game cut possibilities by half—this is also true for the coin-weighing game to identify the single defective coin).

For example, if the random variable is throwing a fair die, the answer to ${q}_{1}\,=$ 'is the outcome a 6?' is less probable ($p({q}_{1})=1/6$) and more informative than the answer to q2 = 'is the outcome an even number?' ($p({q}_{2})=1/2):S({q}_{1})\ \equiv \mathrm{log}\ (1/6)\ ({bits})\gt S({q}_{2})\ \equiv \mathrm{log}\ (1/2)$ (bits), and the 100% certain answer to q3 = 'is the outcome an integer?' adds no information: $S({q}_{3})\ \equiv \mathrm{log}\ (1)=0\ ({bits}).$

The Shannon entropy, is consistent with some desirable properties of a insight gains, but is missing a key ingredient: utility with respect to achieving the goal g, to which we now turn. The search for new insight, is usually done in the context of a specific target goal.

Standard utility, as defined in the von Neumann-Morgenstern utility theorem [15], has long been a foundation of economic and (rational) decision-making, and has successfully described some aspects of human decisions, but not all, by any means [15]. For our cooperative 2-person IQ game, we use the simplest notion of utility, which does the job (Occam's razor). The simplest utility function $U({q}_{i},{g}_{j})\in \{0,1\}$, is assigned by H, once the question qi has been answered, since only H can tell how useful a question qi is, with respect to his/her short-term goal gj . The cooperative H would need to express an observable sign for U (e.g. by clapping or smiling U= 1 (utils), or not clapping or smiling U = 0 (utils) ), so that the agent Acog can measure $U({q}_{i},{g}_{j})$, simply by observing H.

To satisfy all properties ${P}_{i}\ \in P\ (i=0,1,2,3,4)$, we define the insight gained ${\mu }_{I}^{{g}_{i}}(\ {q}_{i}\ )$, from a 'yes' answer to a question qi (information source), to achieve a goal gj to be:

Equation (3)

Definition (3) for the insight gain, satisfies all properties ${P}_{i}\in P$, and is a mixture between the Shannon information S produced, and a personal subjective utility U (von Neumann), acting as an AI-Human bridge for gaining insight. The SN-measure's units can be taken to be ${bits}.{utils}$, and can be normalized into a mathematical (monotone) measure on $[0,1]$. It satisfies the definition of insight as a combination of discernment, understanding and intuition, which all increase under the error-correcting, uncertainty-reducing, and familiarity-boosting IQ game.

4. Shannon-neumann strategies

For the AI agent to learn, the SN-measure (3) demands an iterated cooperation between Acog and the user H, during the IQ, to reach a long-term goal T. In this 2-person (Acog , H) cooperation, each player agrees to follow a well-defined decision strategy (policy) ΠA and ΠH .

Policy ΠA : given a short-term goal ${g}_{j}\in G$, Acog suggests to H, a Bayesian probability (evidence-based plausibility beliefs) distribution, as an ordinal-ranked list of questions Q, from most to least likely informative. Acog can start with an estimated Bayesian prior probability distribution $\{p(\ {q}_{i}\ )\}$, using the relative amounts of uncertainty removed by 'yes' answers, for an ordinal ranking of the $p(\ {q}_{i}\ )$. This means Acog 's initial policy ΠA , is approximate. The 2-person iterated cooperative game should be designed so that ${{\rm{\Pi }}}^{A}\ \to {{\rm{\Pi }}}^{A* }$, the optimal policy. In our next paper, we show that a Shannon-Neumann strategy (based on the SN-measure) allows us to reduce a hard 2-person game of Cooperative Inverse Reinforcement Learning, to a simpler Q-Learning, proven to converge to a near-optimum [17].

Policy ΠH : given a short-term goal ${g}_{j}\in G$, H goes through Acog 's proposed probability-ranked list of questions ${q}_{i}\in Q$, and expresses (signals) their Utility $U({q}_{i},{g}_{j})$.

The IQ game enables Acog to refine its own understanding of human insight gains, in a iterated manner. Recall that Acog 's only purpose in life (objective), is to maximize H's total gain of insights ${\mu }_{T}^{G}(Q)$, over the course of IQP. Using the SN-policy will reduce solving the computationally hard 2-person cooperative game, to a proven Q-Learning of a near-optimal policy ${{\rm{\Pi }}}^{A* }$.

We now ask an important question: is there an optimal way to structure the information source Q, for the short-term goal G? The answer is 'yes', thanks to Shannon's notion of source entropy.

The Shannon entropy S(Q) of the whole information source Q, is the expected amount E, of information from the source Q:

Equation (4)

where ${p}_{i}=p(\ {q}_{i}\ )$ is the probability of a 'yes' answer to the (yes/no) question ${q}_{i}\ \in Q\ (i=1,\ldots ,n)$. In agreement with Shannon's theory, the probabilities are set by the information source, in our case Acog , which structures Q. The AI agent Acog , can construct Q, to interpret the questions qi , as independent hypotheses to be tested, and $p({q}_{i})$ as the probabilities of outcomes, of independent experiments (e.g., repeated coin tosses), or as probabilities of answers to independent questions (e.g., 20 (yes/no) question game).

In this latter interpretation, S(Q) is the expected Shannon entropy (mean information gained per question), from the source Q, which we can maximize, by setting an equilibrium distribution:

Equation (5)

All (yes/no) questions qi  ∈ Q, have an equally probable 'yes' answer, and the expected information gain S(Q) is maximum:

Equation (6)

To optimize the effectiveness/efficiency of the IQ game, the agent Acog should aim to structure Q as close as possible to an equilibrium distribution, to maximize the expected information (and thus, insight) gains by H, during the course of the entire IQ episode. Also, S(Q) increases with the number n of questions in Q.

5. Conclusion

In complex real-world challenges, conceptually sound solutions are based on a solid foundation of key conceptual insights, gained by posing 'good' questions, at the 'right' times/places. If the foundation is weak, due to insufficient insight (often, strong from a strictly financial and/or technological standpoint, but weak on human dimensions), the flawed resulting solutions can be very costly or impossible, to correct downstream. The outcomes in countries with 'just-in-time' supply chains and fragmented health care systems, during the 2020 global pandemic, comes to mind. Thus AI tools which boost human-insight are invaluable, to help us build sound solutions to complex open-ended challenges: designing, implementing, optimizing, validating and verifying complex human-technological systems and networks.

We defined a computational measure of insight gains, which an AI agent can compute, simply by having an internal questioning framework $F(T,G,Q)$, and by observing how a human behaves. The measure requires an information source Q, that is simultaneously informative S (in Shannon bits) about a topic T, and actionable U (useful, in von Neumann utils) for achieving a given goal G. This measure satisfies four necessary properties ${P}_{i}\ \in P$, as well as the intuitive definition of new insights, as gains in discernment, understanding and intuition.

From the perspective of solution-building, as a two-step cycle between (A): formulating a conceptually sound component on solid insights, and (B): executing the technical component (implementing the conceptual component), our measure ${\mu }_{I}^{G}(Q)$ is most timely, since the AI/ML revolution is increasingly automating tasks (B).

In fact, we argue, along with many others before us [18–22] that H should always remain heavily in the loop, no matter how potent AI/ML becomes, for two main reasons: (1) it may eventually be the only work left for us to do, (2) no advanced non-human entity (e.g. Artificial General Intelligence AGI) can do it as well for our own benefit (just as we are incapable of doing it for, say, Bonobos and Orcas, no matter how much smarter we think we are). It takes a human to think and feel like one! This perspective makes our Shannon-Neumann measure ${\mu }_{I}^{G}(Q)$ most valuable as a AI-Human bridge.

In this paper, our SN-measure addressed the question of 'good vs bad' questions. In paper II, we address the difficulty of determining the 'right' times/places, by formalizing the 2-person IQ game, as cooperative inverse reinforcement learning.

Data availability statement

No new data were created or analysed in this study.

Please wait… references are loading.
10.1088/2633-1357/abec9e