Introduction

Interoception, the scientific investigation of body sensations is a popular topic, as it is plausibly related to various significant psychological processes, for example, cognitive and emotional experiences (e.g., decision making, subjective evaluation) as well as mental health (Khalsa et al. 2018). Interoception can be grasped by focusing on different modalities (Craig 2002; Ceunen et al. 2013; Ferentzi et al. 2018), among which the most popular measures are the ones that focus on the cardiac system.

The aim of this narrative review is to provide an overview of the tasks that measure perceptual ability linked to the activity of the heart (often called cardiac accuracy). As several years have passed since the previous reviews (Carroll 1977; Gannon 1977; Jones 1994; Brener and Ring 2016), it is time to discuss this subject by considering modern findings and newly developed methods. Due to the complexity of the topic, this review is necessarily incomplete; thus, to readers who want more information, we recommend reading the original studies and the aforementioned excellent reviews.

Techniques investigating heart activity-related perceptual abilities became popular with the usage of biofeedback techniques. Based on Brener’s proposition (Brener 1974a, 1977), it was assumed that precise perception of the heartbeats is a prerequisite of their voluntary control. Thus, many of the early studies also include training sessions.

Early papers distinguish three types of paradigms in relation to heartbeat perception (or more generally, interoception): self-report, tracking, and discrimination (e.g., Carroll 1977; Pennebaker and Hoover 1984). Recently, approaches that distinguish between self-reports and tasks of accuracy have become popular (Ceunen et al. 2013; Garfinkel and Critchley 2013; Garfinkel et al. 2015). We believe that cardiac accuracy should be regarded as an ability (Ferentzi et al. 2022), distinguished from typical behavior. Therefore, we are not going to focus on questionnaire measures (i.e., measures of typical or past behavior, Cronbach 1949). Additionally, to our knowledge, there is no questionnaire with a focus solely on heart activity-related sensations (although there were some studies that selected related items only, e.g., Blanchard et al. 1972).

Usually, the literature describes two distinct types of cardiac ability measures, tracking and discrimination of heartbeats (e.g., Carroll 1977; Jones 1994). During tracking tasks, participants must follow their heartbeats continuously with movements or counting. During discrimination tasks, participants must decide how an external rhythmic signal relates to their own heartbeats. This is a reasonable categorization, although in some cases, both tracking and discrimination processes are involved (see, e.g., the recently developed method, Pohl et al. 2021). We would like to add a third type which is a less frequently used approach, namely detection of change. Some authors classify these tasks in the categories mentioned above (e.g., according to Carroll 1977, it is a discrimination procedure), but we argue that they are substantially different.

If the task is to detect the change, the focus is necessarily on heart rate (HR) and not on individual heartbeats. The strength of individual beats might change, but it is unlikely to happen separately from HR change. During tracking tasks, the focus of attention (the rhythm of heartbeats vs. individual beats) might vary according to the instruction. During discrimination tasks that apply external rhythm, however, the focus is most likely on the rhythm of the beats and not on individual heartbeats. It is important to note, however, that it is not always clear what the methods measure, whether the task is about the perception of individual heartbeat or the rhythmic HR (see Jones 1994). This uncertainty is well handled by McFarland's terminology; he wrote about “heart activity perception” (McFarland 1975).

As the perception of HR change was the first to be used in research, we will start our discussion with the introduction of this approach.

Measuring change detection

As far as we know, the first study that aimed to investigate heart activity-related body sensations scientifically was from 1960 (Mandler and Kahn 1960). Because this study includes details that are not common nowadays, we will introduce this study in more detail. Mandler and Kahn (1960) investigated two participants separately with different methods. During the first attempt, the participant’s task was to indicate verbally if his HR was changing, i.e., increasing or decreasing at the moment of answering. There were sessions for four days, including various procedures and feedback about the performance. Performance did not change significantly, and the following interview suggested that HR was influenced by respiration consciously. Thus, the other participant was investigated differently, using two light bulbs. The task was to guess which bulb would light up; this was linked to HR decrease or increase. The session ran for 10 days, with ca. 450 trials per day. The participant did not show any deviation from the chance level in his answers during the entire experiment.

Another study (Epstein and Stein 1974) investigated a small sample of participants (n = 10) whose task was to indicate with button presses when they detected a change in their HR level. The reference HR value was calculated as the average HR of the previous 10 min and adjusted every 10 min. Participants were instructed to press the button as often as they could when their HR was differing from the average. Three conditions followed each other: no feedback, feedback, and no feedback. According to the results, participants did not perceive their HR changes accurately, and the accuracy of the response did not improve as a result of the feedback.

Pennebaker and colleagues also studied change detection by investigating whether participants perceived HR-related sensations, i.e., pounding of the heart (Pennebaker 1982; Pennebaker et al. 1982). The study consisted of various tasks (e.g., cold pressor test, mental arithmetic) to manipulate physiological and mood states. At baseline, during, and following the tasks, physiological measurements (e.g., HR, blood pressure) were taken. Participants were also asked about the presence of various symptoms, among which one was “pounding heart.” Pennebaker himself interpreted that this was about “self-reported fast pulse” that was compared with “actual HR” (Pennebaker 1982, p.64). It is a question of whether strictly speaking this task assessed the heartbeat or HR perception, i.e., what was in the focus of the attention.

A new task, called the CARdiac Elevation Detection Task (CARED) (Ponzo et al. 2021), aims to measure how good participants are at perceiving changes in HR during everyday activities. The introduced measurement lasted at least four weeks, each day between 9 a.m. and 9 p.m., during which participants’ HR was detected by a smartwatch. When they got notifications, participants were instructed to judge whether their HR was higher than usual, provide confidence ratings, and free descriptions of prior (last 30 min) activities. To minimize the change in knowledge-based reports, judgment after HR increasing activities (i.e., high-intensity activities and highly emotional states) were excluded from the analysis. The advantage of this method is its high ecological validity which rarely characterizes heartbeat perception methods. The exclusion of arousing activities however is problematic from this point of view, as the importance of cardiac perceptual ability is especially emphasized in relation to activities in which decreased HR is informative (Köteles 2021a).

As we mentioned above, Carroll (1977) categorized these methods under discrimination tasks. We argue that because change detection does not involve forced choice, the required response pattern is substantially different from the one required during discrimination tasks. Also, it is always about the detection of HR, i.e., the detection of a series of stimuli and not about single events. Finally, it does not involve external stimuli, the comparison happens between own (previous and recent) HRs.

Discrimination measures: main types

During measures of heartbeat discrimination, the task is to compare internal sensations with an external signal(s) and decide whether they are synchronous or not. They are also called heartbeat detection tasks (e.g., Ring and Brener 2018).

Because of the large number of methods, it is hard to cover all the versions. In the following, we provide a list of some of the variations of the main features. These variations are also summarized in a table (Table 1).

Table 1 Main types of discrimination paradigm

Studies conducted so far used various external stimuli, including tactile (e.g., vibratory, see Brener and Jones 1974), visual (e.g., light flash, see Whitehead et al. 1976, 1977; Yates et al. 1985), and auditory (e.g., Brener and Kluvitse 1988; Ring and Brener 1992). Nowadays, the usage of auditory stimuli is the most common. According to our knowledge, it was Brener and Kluvitse (1988) who used auditory stimuli for the first time instead of visual ones, to deal with the delayed response due to time needed to change the visual fixation.

Most of the studies use external stimulus that is linked to the R wave of the ECG signal and add different lengths of delay following the R wave. A less frequently used method (typically used by early studies) to produce non-heart-contingent stimuli is to present a rhythm that follows the participants’ HR in a certain previous time period; thus, it is not linked directly to the ongoing heart activity (Brener and Jones 1974). Another similar method is when the non-contingent stimulus is based on the participant’s pre-recorded ECG (Weisz et al. 1988).

Studies using the rhythm of the heartbeats directly differ in the number and length of the applied delay. In some cases, there were only two delays (e.g., Whitehead et al. 1977) while in others, there were several ones (e.g., six, Yates et al. 1985; Brener and Kluvitse 1988). Some studies also included a training session (Brener and Jones 1974), typically studies from the biofeedback area.

Probably the most important questions related to discrimination methods are the number of delays and the applied calculation. We will discuss these related topics in the following in more detail.

Discrimination measures—the relation of the external stimulus to the heartbeat

During heartbeat discrimination tasks that use two types of external stimulus only, usually, one is supposed to represent the real heartbeats while the other does not. This can be achieved with two methods. Either by using a delay that is probably not sensed together with the heartbeats, or by using a rhythmic stimulus that is not related timewise to the heartbeats.

The first classical discrimination task is from Brener and Jones (1974). They used an external stimulus that was either contingent or non-contingent with the heartbeats. Non-contingent rhythm was produced with a pulse generator that was set at the HR of the participant. The downside of this method was that with the modification of the heartbeats (e.g., with respiration), people could learn to cheat (Katkin et al. 1981). Modified versions of the Brener-Jones method were also used later (e.g., Clemens and MacDonald 1976; Weisz et al. 1988).

The Whitehead procedure, one of the most often mentioned discrimination methods, provided an alternative (Whitehead et al. 1976, 1977). During the Whitehead procedure, rhythmic external stimuli are presented either with 128 or 384 ms delay after the R wave of the ECG. The 128 ms long delay is supposed to represent the immediate feedback, based on the rationale that 100–150 ms is needed for the blood pulse wave to reach the neck (Whitehead et al. 1976). Thus, it was also assumed that the neck is where heartbeats are perceived. Later, Whitehead and colleagues (1977) also assumed that it is the contraction of heart muscles that we feel; a view that was shared by others (e.g., Schandry and Specht 1981; Katkin 1985).

Many following studies used the technique of Whitehead, usually with some modifications (e.g., Katkin et al. 1981, 1982; Knoll and Hodapp 1992). For example, Katkin and colleagues (1982) modified the Whitehead procedure in a way that instead of immediate and delayed response they provided stimuli that were presented either at fixed or varied time intervals after the R wave. In the case of the latter, the added delay gradually grew after each heartbeat. Thus, with the modification of their own HR, the non-contingent stimulus changed too.

The Whitehead procedure was criticized because it did not take into account the possible individual differences in perceptual abilities (Yates et al. 1985; Ring and Brener 2018). It is also an open question which heart activity-related event of the cardiac cycle is sensed and when (Schandry et al. 1993).

Clemens (1984) was the first who studied systematically whether the 128 ms was the optimal delay to be the correct choice of heartbeat perceivers and examined several temporal loci for the S + signal in the Whitehead-type method. It was found that between 0 and 200 ms time delays, participants have about the same chance of judging the stimulus as simultaneous with their heartbeat. Yates and colleagues (1985) applied more than two types of external stimuli, i.e., stimuli followed the R wave with 0, 100, 200, 300, 400, or 500 ms of delay. Contrary to the results of Clemens (1984), this paper reports that the 200–400 ms time delays were judged simultaneous more often than the others. These findings emphasize the importance of taking into account individual differences in the timing of heartbeat perception relative to the R peak (Ring and Brener 2018). Some studies using more than two delays used a single stimulus (Yates et al. 1985) while the majority used longer rhythmic sequences (Ring and Brener 1992, 2018).

Couto and colleagues (2015) criticized the discrimination paradigm because attending simultaneously to the cardiac sensations, and external stimuli would generate interference which is a confounding factor. According to Ring and Brener (2018), however, this is not a valid counterargument as the nervous system is constantly involved in tasks of parallel processing of external and internal stimuli and also because judging simultaneity is a well-developed human ability. Interestingly, developers of a new measure called the Interoception–Exteroception Synchronicity Judgment (IESJ) task argue that the ability to compare simultaneous interoceptive and exteroceptive stimuli is a significant one (Yang et al. 2022b) and developed a method specifically to investigate this ability.

Discrimination measures: calculation

Another important topic is the calculation of heartbeat perception accuracy. This is linked to more general topics, i.e., how discrimination methods are conducted and what they measure. Generally speaking, most of the discrimination methods apply forced choice, i.e., participants have to choose between two possibilities, even if they are ambiguous or have no sensations (except for instance: Brener and Kluvitse 1988, where they had three options, “very certain”, “not very certain” and “I don’t know.”) This response format makes it possible to grasp non-conscious heartbeat sensation which is a pro of discrimination tasks in general (Köteles 2021b).

Some of the discrimination tasks (Yates et al. 1985; Brener and Kluvitse 1988; Brener et al. 1993) apply the classical psychophysiological method of constant stimuli (Engen 1971) using six delays. Among these, Yates and colleagues (1985) used only a single stimulus. Thus, this task of the discrimination paradigm requires heartbeat perception and not HR perception. The task, i.e., matching a single tone with a single heartbeat is very difficult; thus, the method did not become popular. Brener and Kluvitse (1988) applied rhythm, but in their case, the participant could freely choose among the six external rhythms with a button. Each trial ends when the participant has chosen which of the six intervals are simultaneous with their own heartbeats.

The two-interval version of the discrimination task (Whitehead et al. 1976; Katkin et al. 1981) makes possible the usage of signal detection theory. The advantage of this method is that it separates “sensitivity” and “bias”, the tendency to give a certain kind of answer for reasons that have nothing to do with the accuracy of perception (Whitehead et al. 1977). This advantage is overshadowed by the criticisms mentioned above (i.e., low number of applied delays).

Authors using six intervals typically calculate two indices to measure heartbeat perception ability. One defines heartbeat perceivers (vs non-perceivers) with the usage of chi-square form to decide whether the choices of the participants differ from the uniform distribution or not. The other quantifies cardiac accuracy by calculating the interquartile range (IQR) based on the cumulative percentage frequency distributions of simultaneous judgments (Ring and Brener 1992, 2018; Brener et al. 1993).

Wiens and Palmer (2001) raised a problem with chi-square statistics which judge a two-peak distribution as a detector. According to them, this is a problem, because it is not likely that heartbeat perceivers can be characterized with two distant peaks. They argue that an inverted U-shaped histogram characterizes a heartbeat perceiver. The assumption that good detectors rarely chose the 0 and the 500 ms delays as the synchronous supports the usage of quadratic trend analysis, i.e., to test whether a second order function gives a significantly better fit to the distribution of the choice ratio between the six intervals than the linear trend.

Tracking measures: main types

During heartbeat tracking measures, the task is to follow the ongoing heartbeats with counting (mental tracking) or finger movements, typically with tapping (motor tracking). The most popular version is from Schandry (1981). Using his task (often called Schandry task), cardiac accuracy is calculated by comparing the number of actual heartbeats to the number of reported heartbeats using the following formula for each interval: (1 −|(HBactual−HBreported)/HBactual|) and then, averaging the score of the intervals. Higher (i.e., close to 1) values presumably indicate higher cardiac accuracy. It is important to note that Schandry was not the first, however, who used heartbeat tracking (Brener 1974b; McFarland and Campbell 1975; McFarland 1975; Dale and Anderson 1978; Hamano 1980; Pennebaker 1981; Yates et al. 1985) and studies using his method usually slightly alter his original version (e.g., typically they explicitly prohibiting guessing). The first versions of the tracking task required motor tracking with a button press (Brener 1974b; McFarland 1975).

A recently published paper summarized the main versions of the mental heartbeat tracking task (Ferentzi et al. 2022, Table 1), but it also includes possible variations and recommendations. This review does not cover the latter aspects but also includes motor tracking tasks. Table 2 summarizes some important characteristics.

Table 2 Main types of tracking paradigm

The measurement of the heartbeats might happen with an ECG (Schandry 1981; Ring and Brener 2018), pulse oximeter (Corneille et al. 2020), or chest belt (Desmedt et al. 2018). We recommend the usage of ECG because according to our personal experience in our lab, both the pulse oximeter and the chest belt are able to produce extra sensations of cardiac activity by pressing the body’s surface.

The observation of heartbeats might happen during intervals of various numbers and lengths, for instance for 25, 35, and 45 ms (Schandry 1981; Ring and Brener 2018; Pohl et al. 2021), or four 2 min long intervals (McFarland 1975). Some studies use as many as 11 intervals (Pollatos et al. 2005) or make different length variations for half of the participants (Murphy et al. 2018a). Some studies also have a practice phase before the measurement trials (e.g., 5 s, Pohl et al. 2021).

The exact formulation of the instruction is especially significant as this influences many important aspects of the task performance (Ferentzi et al. 2022). Instructions usually do not reveal any specific information about the upcoming length of the trials, including only that they vary in length (Ring and Brener 2018). Studies usually include explicit instructions regarding the conduction of the task, such as not to guess, not to count without having heartbeat-related sensations, and to count weak or uncertain sensations, so instructions usually suggest whether false positive counting is avoidable and false negative is not or vice versa. This aspect strongly affects accuracy scores. It is a problem, however, that the exact applied instruction is not always published.

Tracking measures: instruction and validity

The instruction, i.e., what is exactly asked from the participant is especially important in the case of the mental tracking tasks. Thus, it is essential to report the exact instruction (Ferentzi et al. 2022). The wording of the instruction has come into focus with the recent debate around the validity of mental tracking tasks, and as they are closely linked, we are going to discuss these topics together.

The validity of the tracking tasks has been questioned from early on (e.g., Carroll 1977; Pennebaker 1981; Yates et al. 1985; Flynn and Clemens 1988; Weisz et al. 1988; Katkin and Reed 1988; Jones 1994), but more recently, the debate on its scoring, reliability, and validity has heated up (Zamariola et al. 2018; Ainley et al. 2020; Corneille et al. 2020; Zimprich et al. 2020). The Schandry task has been criticized mainly because it is presumably influenced by factors that do not reflect cardioceptive abilities (Ring and Brener 2018; Zamariola et al. 2018). Instead, it is influenced by factors such as beliefs about and knowledge of HR (Ring and Brener 1996, 2018; Windmann et al. 1999; Ring et al. 2015), expectations (Körmendi et al. 2021), and time estimation ability (Desmedt et al. 2020). What task measures depends partially on the applied instruction (Ehlers et al. 1995; Desmedt et al. 2018).

It is particularly important to specify in the introduction what body part should the participant focus on, for example on specific body parts (i.e., the chest only) or on the entire body (Ferentzi et al. 2022).

An often-cited version of the mental tracking task allows the estimation of heartbeats (Schandry 1981). Supposedly, this is to deal with the weakness and vagueness of the cardiac sensations. Please note that many of the subsequently discussed studies use different instructions for the mental heartbeat tracking task, i.e., they explicitly do not allow estimation (except, e.g., Pollatos et al. 2007; Ferentzi et al. 2021).

There are some methods that can be applied to lower the possibility of the inclusion of non-cardiac ability-related factors and/or control their effect. It has been recommended to enhance reality check with the inclusion of penalty for too many heartbeats (Ferentzi et al. 2022). Before the task, some studies investigate the knowledge about HR by asking the participants about their personal HR (Ring and Brener 1996; Ring et al. 2015), their usual HR (Desmedt et al. 2020), or HR in general (Murphy et al. 2018b), or check time estimation accuracy (Desmedt et al. 2020). After the task, it has been recommended to check whether the instruction has been remembered by asking for a written recall of it (Ferentzi et al. 2022). Another method is to ask about the applied strategy after the task (Desmedt et al. 2018).

Tracking measures: required response and calculation

The main differences between the motor and mental versions of the tracking paradigm are the response required by the tasks and the method to calculate cardiac accuracy. Although the mental tracking version of the tracking paradigm is more popular in interoception studies, the motor version has some advantages. One of the advantages of motor tasks (as opposed to tracking tasks) is that they do not require extra working memory capacity (Richards et al. 1996; Couto et al. 2015). More importantly, it allows us to identify the relation between the detected and real heartbeats to some extent. However, there are some versions that do not use this advantage and only work with the number of taps (McFarland 1975; Weisz et al. 1988), as we will see below. Another question is whether the possibility of false alarms is taken into account. The mental tracking method is not able to deal with these aspects; on the other hand, its score is not influenced by the disturbing factor that the extra sensations of button presses might cause. Typically, during motor tracking tasks, participants have to follow their heartbeats by tapping with their finger (Weisz et al. 1988) or button presses (Brener 1974b; McFarland 1975).

Criticism of the Schandry task

In the following, we are going to briefly summarize the recent debate about the validity of the Schandry task by focusing on the first paper (Zamariola et al. 2018) and by adding our own comments on the subject as well. We are not going to cover all the arguments; however, for readers who are interested, we recommend the original papers (Zamariola et al. 2018; Ainley et al. 2020; Corneille et al. 2020; Zimprich et al. 2020).

Zamariola and colleagues (2018) mentioned four characteristics of the Schandry task that can be criticized. Firstly, they found it problematic that most participants during the Schandry task under-report their heartbeats. Interestingly, they did not mention among the possible reasons that instructions typically encourage the avoidance of false positive detections. According to Zamariola and colleagues (2018), this tendency implies the effect of accuracy-irrelevant factors, typically linked to decision threshold. Ainley and colleagues (2020), however, argued that the expectation regarding false positive counting is not self-evident, as missing a heartbeat is more natural than hallucinating one. In response to this, Corneille and colleagues (2020) state that over-reporting might have significance in clinical research, but in the case of people with high Schandry scores, good detectors and the “hallucinators” are mixed. Thus, in their opinion, the Schandry scores can be considered a better indicator of underestimation than interoceptive accuracy (Zamariola et al. 2018; Corneille et al. 2020).

Secondly, Zamariola and colleagues (2018) stated that the correlation between actual and reported heartbeats is low. Corneille and colleagues (2020) argue furthermore that this low correlation suggests the contribution of non-interoceptive processes and questions the validity of the Schandry scores. This criticism can be challenged in several ways. Both non-interoceptive and interoceptive processes target the estimation of the heartbeats. Thus, low correlation cannot be simply explained by non-interoceptive strategies. Additionally, a high correlation cannot be expected between these values, because although the counted numbers aim to reflect the number of real heartbeats determined by physiological factors, the individual variability in the perceptual ability is much greater than the variability in the HR itself. Therefore, the variability of the counted heartbeats can be regarded as constant, which means that the estimation of an unknown constant value is not correlated with the constant value. Moreover, this correlation is higher among people with average scorers than among people with higher scores (Zamariola et al. 2018). According to Ainley and colleagues (2020) and Zimprich and colleagues (2020), however, we cannot expect a higher correlation among better-scoring subjects and among the lower-scoring ones, because of the mathematical characteristics of the ratio arithmetic.

The third critical point of Zamariola and colleagues (2018) is that the Schandry scores are increased at slower HR. The authors explain this with the under-reporting of the participants, i.e., that lower actual HR causes a lower difference. Ainley and colleagues (2020) prove this negative relationship also by the ratio arithmetic of the formula. Corneille and colleagues (2020), however, explained this phenomenon by the fact that lower HR is accompanied by a greater stroke volume when the cardiac output is constant which means that cardiac perception is confounded by the strength of the heartbeat signal.

The fourth and final point of Zamariola and colleagues (2018) is that the Schandry scores are lower for longer intervals than for shorter ones, so the underestimation increases with longer time intervals. The same phenomenon was described in the case of time estimation, which led the authors to the conclusion that heartbeat estimation is based on the use of HR-related knowledge and beliefs. While Ainley and colleagues (2020) did not, Corneille and colleagues (2020) replicated the results of Zamariola and colleagues (2018). All these authors recommend setting constant time intervals across studies. Interestingly, the possibility that attention is hard to sustain for a longer period is not highlighted by any of these authors.

Dealing with motor responses

McFarland (1975) calculated cardiac ability based on his motor task by taking the absolute difference between the number of buttons presses and the number of actual heartbeats and dividing by the number of heartbeats. This ratio was subtracted from 1 in order to get a score that is larger the better. This method was applied by Schandry (1981) in his mental tracking version with some modifications, as McFarland did not consider the time periods when there were no button presses.

Ludwick-Rosenthal and Neufeld (1985) also applied a motor version of the tracking paradigm but applied different calculations. The task of the participants was to tap with their index finger to the beat of their own heart rhythm. They worked with the latency of the taps, i.e., calculated the time difference between the R peak of the ECG and the button press. After this, they calculated the standard deviation of tap latencies for three interoceptive sessions. The average of these values served as a measure of interoceptive accuracy.

A relatively new version of the motor task consists of five conditions that follow each other in fixed order (Melloni et al. 2013; Sedeño et al. 2014; Couto et al. 2014). Participants must tap with their heartbeats on a computer’s keyboard. Among these tappings, only those are considered that happen in a pre-defined time window around the R wave of the ECG. The scale of the time window depends on the participant’s HR. For example, with an HR of 69.75–94.25, the time window is between 0.1 ms before and 0.6 after the R wave (Sedeño et al. 2014). This method, however, does not take into account the possibility that some keypresses might happen close to the R wave (i.e., in the pre-defined time window), but they are still random events. Moreover, this calculation does not take into account individual differences in heartbeat perception that are not related to HR (just like in the case of Whitehead-task, see Ring and Brener 2018).

New versions of the motor tracking method (Smith et al. 2020, 2021; Körmendi et al. 2022) offered new approaches regarding the way to calculate accuracy score. During both measures, the task of the participants was to move their fingers in synchrony with their heartbeats, either by pressing a button (Smith et al. 2020, 2021) or moving their fingers in the air (Körmendi et al. 2022). The disadvantage of the former is that it involves extra tactile information, which might be a disturbing factor. Although both methods provide an accuracy score, the applied calculation differs. Firstly, Smith and colleagues (2020, 2021) also take into account tappings that are before the heartbeats which means that they accept “guessing” with a bigger chance, e.g., if somebody catches the right rhythm based on a couple of sensed beats. Secondly, they use a different calculation to estimate the amount of synchronicity between the finger movements and heartbeats.

Smith and colleagues (2020, 2021) used the variance of the differences between the tappings and the closest heartbeats as a consistency measuring score. The differences were calculated as negative values when the heartbeat was before the tap. Because the higher HR decreases the variance, the score was corrected with a bootstrap method to equalize this bias.

Körmendi and colleagues (2022) suggest a new possibility to exclude random movements with the application of circular statistics. It was assumed that heartbeats are quasi-periodic; one R-R interval can be considered a period. Using the language of circular statistics, one period is described as 360 degrees. The timing of the finger movement can be translated to one degree. The Rayleigh test shows whether the degrees linked to the finger movements differ from the unique distribution or not. Thus, this technique filters the non-random finger movements which are synchronous with the heartbeats. Both the method of Smith and colleagues (2020, 2021) and Körmendi and colleagues (2022) provide accuracy scores (which probably correlate); but while the calculation method of Körmendi and colleagues determines the statistical significance of cardioceptive accuracy by a hypothesis test, Smith and colleagues provide a performance score only.

Mixed methods

Some of the new methods are similar in some respects to one paradigm or another, but to label them as such would be considered artificial since they are essentially different.

Combination of matching and tracking

There are two new methods that require the matching of the heartbeats with external stimuli, either with visual (Palmer et al. 2019) or with auditory (Plans et al. 2021).

The task of Palmer and colleagues requires active reproduction of the rhythm of the ongoing heartbeats (Palmer et al. 2019) by changing the visually presented HR with a slider that changes interbeat interval between two alternating hearts of different sizes. As participants could spend with this task as much time as possible, the fluctuation of heart activity sensation (Ainley et al. 2016) is not a confounding factor.

During the task of Plans and colleagues called the Phase Adjustment Task (PAT) (Plans et al. 2021), participants must adjust rhythmic tones by changing their phase till it is synchronous with their heartbeats. The task works with strong assumptions which is also reflected in the provided results: It classifies people as interoceptive, not interoceptive, or not classifiable.

Combination of signal detection and tracking

There are two new methods that combine signal detection theory and tracking of heartbeats. One of them requires the participant to indicate the presence of heartbeat sensation during the presence of a visual stimulus (Herman et al. 2021). During the other, participants had to count their heartbeats during a certain time period, and afterward, they were given a forced choice task on the number of heartbeats counted (Pohl et al. 2021).

The method of Herman and colleagues is designed in a way that makes it possible to conduct while doing MRI measurements (Herman et al. 2021). It is, however, a quite complicated task, with high cognitive demand. People are instructed to focus on their heartbeats while looking at crosses of different colors. Each cross is linked to a certain finger, and participants have to indicate heartbeat sensation by moving the corresponding finger.

Another version of the mental tracking paradigm enables the usage of signal detection (Pohl et al. 2021). The participant’s task is to focus on their own heartbeat, and after a short time period, they have to choose between two options. One is a short interval of sensed heartbeats (e.g., in the case of 7 actual heartbeats it is 6–8), while the other is either “less” or “more”. The trial lasts for 7–11 consecutive heartbeats, assessed by parallel ECG measurement. Five different interval lengths are presented. Participants’ answers are interpreted as hits, misses, false, alarms, or correct rejections to apply the theory of signal detection. Based on that, d’ (sensitivity) and c (response bias) values are calculated. These were shown to be related to the score calculated based on the mental tracking task, supporting the assumption that the Schandry score is a mixture of both sensitivity and response bias (Pohl et al. 2021).

Combination of tracking and change detection

A new method used the mental heartbeat tracking method but optimized it for sensitivity to changes in the rate of heartbeats (Larsson et al. 2021). They used several heartbeat counting trials, so it was possible to calculate linear regression on the reported and actual HR pairs. The slope of the fitted linear is 1 in the case of the perfect correspondence between the reported and actual heartbeats. The interception (α) of the linear informs about the participant's tendency for the over- (α > 0) or under-report (α < 0) of the heartbeats. It was hypothesized that people who are better at perceiving their heartbeats are also better at perceiving the changes in the resting HR; thus, the number of reported heartbeats should change accordingly.

Combination of discrimination and heart rate perception

A recently developed discrimination task (Legrand et al. 2022) asks participants to focus on their HR and after to decide whether a series of sounds is faster or slower. To control the possible non-interoceptive processes (such as working memory or time estimation), an exteroceptive control condition was also included, during which participants had to differentiate between the frequency of sound sequences (faster vs. slower). The difference in frequency (Δ-BPM) between the two stimuli of the interoceptive or the exteroceptive tasks was selected using an adaptive staircase method which adjusted the Δ-BPM values to precede what the participant responded to. The adaptive Bayesian psychophysical method (“Psi”) was used to measure interoceptive accuracy. This method estimates the probability of the second stimulus being perceived as having a higher frequency as a function of the difference in frequency (Δ-BPM) between the two stimuli. In this curve, the 0.5 probability point shows the Δ-BPM value where there is an equal chance to judge the feedback sequence as faster or slower than the perceived HR (interoceptive task) or the stimulus before (exteroceptive task). If this α value is negative, the participant underestimates HR; if it is positive, HR is overestimated. The absolute value of α tells the magnitude of the under- or overestimation. Another value (ß) estimates the slope of this curve in this 0.5 probability turning point, which represents the uncertainty of the decision. The advantage of this method is that it measures the ability to detect HR by estimating the statistical distribution function, rather than individual heartbeats. The large number of trials (80 interoceptive and 80 exteroceptive), however, is probably overwhelming for the participants.

Measuring the cardiac ability of infants and children. Indirect measures

Measures designed to assess the cardiac ability of infants and children represent a specific category. The main quest is to deal with the lack of verbal reports and/or arithmetic ability.

The Infant Heartbeat Task (iBEAT) aims to investigate the cardiac ability of infants by applying a sequential looking paradigm (Maister et al. 2017). During this task, an animated character is moving either in synchrony or asynchrony with the infant’s own heartbeat. At the group level, results showed that the asynchronous stimulus is preferred. Later, a modified version of the iBEAT was applied to an adult sample (Yang et al. 2022a).

Another task described as an adaptation of the Schandry task (1981) investigates preschool-aged children (4–6 years old) (Schaan et al. 2019). After doing jumping jacks for 10 s, children had to choose among four circles of different sizes (indicating slow, moderate, quick, and very fast HR) the one that represented their HR the best.

These methods were designed to require a different answer from the participant than the usual, and therefore, it is also a question of to what extent are they comparable with those. The results provided by the method of Maister and colleagues (2017) were supported by the characteristics of the measured heartbeat evoked EEG potential (Schandry et al. 1986); a phenomenon that was not discussed in our recent paper, but raises many questions on its own (Ring and Brener 2018). The method of Schaan and colleagues (2019) investigates heart activity perception after physical activity. Leaving aside that other signals than heartbeats can also indicate increased HR, it is also a question of how much this tells about resting heartbeat perception.

Comparison of techniques

There are various basic differences between these measurement types, i.e., change detection, discrimination, tracking tasks, mixed methods, and indirect measures; some of them are already mentioned briefly above.

Firstly, some measures require attention to rhythm (typically change detection and discrimination tasks), while others require focus on single events (typically tracking tasks). The ability captured by the two approaches might differ.

Secondly, measures are sensitive to non-conscious sensations at different levels. Forced choice tasks are sensitive to near-threshold stimuli, while measures applying signal detection theory are also able to differentiate between sensitivity and bias.

Thirdly, the calculated indices also differ substantially (but please note that this difference might not be always method-specific). While some methods provide a performance-based accuracy score only, others also determine the significance level by hypothesis test.

Fourthly, procedures probably require different components of attention. On the one hand, the discrimination procedures require divided attention by focusing on both internal and external signals. On the other hand, the traditional Schandry task presumably requires other components of attention more strongly (Matthias et al. 2009; Vig et al. 2021). To date, no direct comparison of the tasks with respect to their attentional demands has been published.

Finally, it is probable that measures of interoceptive ability differ regarding the required cognitive effort, depending on the length and the difficulty of the task.

There are some studies that compare various heartbeat perception methods (Jones et al. 1984; Davis et al. 1986; Knoll and Hodapp 1992; Brener et al. 1993; Hart et al. 2013; Ring and Brener 2018; Pohl et al. 2021; Körmendi et al. 2022), but there is only one meta-analysis about this topic (Hickman et al. 2020). It reviewed papers that used both the mental tracking and the Whitehead-type discrimination tasks, pooling 22 studies. It revealed a small but significant correlation between the accuracy scores of the tasks, with a pooled effect size of 0.21 (p < 0.001) and with an R2 value of 0.044. Based on these results, the authors questioned the interchangeability of the two tasks. However, this result might be due to the unreliability of one or both tasks. To investigate the relationship of the two procedures, latent variable analysis would be helpful. Alternatively, a meta-analysis covering multiple tasks representing both tracking and discrimination procedures.

Limitations of the current review

The current review is not without limitations. Firstly, it is not based on a systematic literature search. Secondly, it focuses on heartbeat perception tasks only which represent a relatively narrow (although popular) field of interoception research. Thirdly, the depth of description of the different methods varies and does not necessarily express the significance of the given method. We did not evaluate each method according to a standard set of criteria, nor did we cover studies that focused on empirical comparison of the measures.

Conclusions for future biology

Since the first versions, various tasks have been developed to measure the ability of heartbeat perception. The two main aspects that have to be taken into account are the circumstances of the measurement and the method to calculate the accuracy score. Besides the classical, extensively criticized methods there are various new versions that combine existing elements or involve new ones.

What is often not highlighted when a measure is described is whether it requires the detection of single beats or a rhythm and whether detection change is also involved. The ecological validity of the measures is also rarely emphasized. It is still an open question whether the perceptual ability during resting conditions corresponds with the ability when HR increases and becomes more informative for the participant. Is the distinction between detectors and non-detectors still valid under these circumstances? Some of the assumptions of the calculations are also not mentioned. Signal detection theory is preferred mainly because it deals with response bias. It assumes, however, ideal circumstances (i.e., regarding the normal distribution of the underlying perceptual processes), which might be not the case.

When using a heartbeat perception method, the pros and cons of the given measure have to be carefully considered.