Introduction

In the last decades, laboratory and field experiments have been designed to test theoretical models in many economic areas [23, 24], such as bargaining [77], auctions [52], public good provision [93] and finance [28]. More recently, economic experiments have proven to be a useful tool to test individual and organizational decision-making process related to health and healthcare [34]. In fact, controlled environments, such as experimental laboratory, allow to test ex ante the effects of health policy changes, like the introduction of new financial incentive schemes to physicians and minimize confounding effects when looking for causality nexus between variables. Such desirable feature becomes extremely relevant in health economics where agent’s behavior could affect individual wellness, has legal consequences and is ethically sensitive.

The merit of experimental methods led to a fast-growing literature in health economics, addressing several topics: risk and time preferences (e.g., [33], health insurance choices (e.g., [50]), providers’ incentives (e.g., [27]), altruism (e.g., [62]), competition (e.g., [12,13,14]), professional norms (e.g., [56]), malpractice (e.g., [30]), medicines price policies (e.g., [90]). Also, the design of experiments in health economics may vary in several dimensions [34] like, for instance, the wording of instructions (neutral vs. health-related) and the type of participants (students, medical students, or physicians) joining the experimental sessions [3553, 86]. Therefore, a comprehensive, systematic, and reader-oriented review of experimental health economics may be of help to guide scholars through this new stream of experiments. We believe our work contributes to fill this gap in the literature.

Also, we acknowledge that other scholars have previously reviewed some of the existing literature, though with different purposes. Galizzi and Wiesen [34] critically discuss the state-of-the-art, explaining the methodologies, debating potential areas of application of experiments to health, and thus suggesting scopes for further research. Also, Vlaev et al. [84] summarize the available literature on the use of financial incentives to change health behaviors. However, whereas the former could be classified as a methodological paper, rather than a review, the latter focuses on a very specific subject, lacking the comprehensiveness which we aim to achieve with our work. Finally, in a special issue of the Journal of Economic Behavior and Organization, Cox et al. [20,21,22] provide an overview of laboratory experiments in four different topics of healthcare research (i.e., clinical decision support, physicians’ incentives, healthcare systems and insurance, healthcare delivery, and public health), emphasizing in the conclusions all the strengths of experimental methods.

Thus, there is a lack of a comprehensive collection of the main contributions and their most relevant features from the supply side perspective. For this reason, we conduct a systematic literature review of the articles published in peer-reviewed journals from 2011, examining laboratory experiments in health economics focused on the supply of health services. The initial bibliographic metadata is drawn from the SCOPUS database. Of the 1248 articles retrieved from 2011, 56 articles have met our quite selective inclusion criteria, which restrict the attention to laboratory or online (hypothetical lab) experiments and to the experiments whose data have been gathered by merging laboratory and artefactual field experiments or combining lab/online experiments and non-experimental methods. For the sake of consistency, when the study results from the combination of a lab and a field experiment, we discuss only the aspects emerging from the former. Similarly, when lab or online experiments are merged with non-experimental methods (e.g., surveys or discrete-choice experiments), we look only at the experimental data. In fact, laboratory experiments can complement field experiments and non-experimental methods [34] and may serve as a ‘wind tunnel’, before really implementing a policy change or running a large-scale study [20,21,22]. According to Harrison and List [45], laboratory experiments are among the most appropriate methods for a counterfactual analysis (especially in the health field), since they allow for the identification of a control group, through randomization. In a nutshell, they define the gold standards for the environmental control [19]. Online experiments represent the main alternative to experiments run into the lab (especially under COVID-19 restrictions), given their lower costs to compensate for participants’ opportunity costs and the possibility to recruit large samples of participants which allow to perform high-powered testsFootnote 1 and are more representative of the general population [44]. Differently, according to Charness et al. [19], field experiments are often infeasible, since they require sufficient variation, randomization, and the need to make the experiment ‘invisible’ to participants. Another drawback is the impossibility of replicating the experiment, which represents instead the main property of lab experiments [19]. Thus, we have decided to exclude field experiments, which do not allow for the level of experimental control that is critical for their internal validity. We have also excluded discrete-choice [25], control trials and quasi-experiments based on hypothetical decisions (stated preferences) only, since the incentive structure, together with the experimental control are fundamental aspects of experimental methods in economics (see [44]). Looking at the different areas of interest, we have detected one main macro-category topic, payment schemes, which covers a large portion of our dataset, although other research topics such as health insurance, competition and risk preferences will be discussed too.

The remainder of the paper is organized as follows. “Background” presents our background and the method applied in the systematic literature review, showing some preliminary results on bibliographic “metadata”; “Basic summary of the sampled publications” discusses the basic summary of the selected papers; “Review” describes the selected papers distinguished by topic. Finally, “Concluding remarks” provides some concluding remarks.

Background

Literature review method

In this section, we outline the method and selection criteria used to review the literature. First, we elaborate upon our selection criteria regarding the lab experimental approach in the field of health economics and the supply side perspective.

According to Greenhalgh [41], a systematic review is nothing but an overview of primary studies which explicitly defines objectives, materials, and methods and has been conducted following an explicit and reproducible methodology. There are three main advantages for writing a systematic review: to summarize the existing evidence concerning a given topic, to detect any gap which leaves space for future research; to provide a framework which helps to locate new research activities in appropriate positions [57]. Although it shares some peculiarities with a traditional literature review, a systematic one must be looked at as a self-contained research project, which investigates a clearly defined issue [26]. Differently from systematic reviews, narrative reviews do not indicate neither the databases and the methodologies followed to perform the review, nor the inclusion criteria used to extract the dataset, thus preventing other authors from replicating the study [78]. Hence, we opt for a systematic review built in three main steps. First, we select the database to be investigated (for instance Scopus, Web of Science, PubMed, Google Scholar, etc.) and, by looking at the papers, we detect the keywords which allow us to build our search string. Throughout this stage, we select the papers to be analyzed, by defining our inclusion/exclusion criteria. In the second stage, we provide a descriptive and content analysis of the papers included in our sample. Finally, we focus on each of the selected papers, summarizing its contents and comparing different experimental settings and findings. Figure 1 reports the main steps of the literature search and identification of studies.

Fig. 1
figure 1

Source: our elaboration

Main steps of the literature search and identification of studies.

Several bibliographic databases, containing articles in peer-review journals and other types of publications, could potentially represent data sources to carry on a systematic review (for instance, Google Scholar, Web of Science (incl. MEDLINE), Scopus, EconLit, etc.). One relevant perspective to embrace, when choosing the most appropriate database, is whether it is endowed with a classification system that leads to the balancing of two conflicting goals: (1) to gather a wide coverage of the most suitable outlets where to publish papers focusing on our topic; 2) to allow to differentiate among publication subjects (e.g., [18]). Following the approach of Robinson and Botzen [76], we opt for the SCOPUS database in conjunction with Google Scholar, applying a parallel check through snowballing. Indeed, SCOPUS database spans from the general field of health to more specialized fields of health economics and experimental economics, offering a quite accurate definition of the subject areas and a good coverage of citation data in scholar journals. Thus, we are confident that SCOPUS database covers an extremely large proportion of the different experimental approaches used in health economics. We will explain below how we have managed to be reasonably confident that our sample of papers is as inclusive as possible of the literature that meets our inclusion criteria.

Data collection

Once chosen the database, our systematic literature review process moves to data collection. We ran a database search to inform this review in July 2023.Footnote 2 As previously explained, we limit our research to SCOPUS using a search string which includes the words lab, experiment, physician, and economic.Footnote 3

To guarantee the quality of the selected works, we consider articles written in English language and published in peer-review journals from January 2011 onwards. Hence, we have found 1232 papers focusing on broad selection of topics and subjects.

The selection criteria are based on types of studies, types of experimental approaches, and types of topics. For a study to be included, it must deal with health economics topics, adopt the experimental methodology, look at the supply side, and be a laboratory experiment or an online experiment. To be crystal clear in defining our inclusion rule, we have considered eligible only papers in which subjects in the lab, and eventually merged with artefactual field sessions, or over the Internet, have been asked to provide health care services under different economic incentives. In the case of papers whose dataset is obtained by merging different types of methods, we have included only the results coming from laboratory or online sessions. Consequently, we have considered all other settings ineligible. Thus, we have excluded all those papers not related to health economics issues and those not applying the experimental methodology,Footnote 4 at least mainly, in a controlled setting. Furthermore, from the health economics experimental papers, we have excluded all the experiments run in the field or natural experimentsFootnote 5 as well as discrete-choice experiments, control random trials and quasi-experiments, those based on hypothetical choices where the economic incentives to participants have been missing and experiments on health-related behavior [75].Footnote 6

After reading titles, abstracts, and keywords, we have excluded 1151 articles that have not met our inclusion criteria. The full texts of the remaining 65 articles have been read in parallel by two researchers and, in the case of disagreement, by a third one (the so-called benefit of the doubt rule). Only 43 papers have passed the final selection meeting our eligible criteria, mainly in terms of experimental settings.

Since our search strategy and our choice of the SCOPUS database may have missed some important references on the topic, we apply a parallel check through snowballing, using the reference list in each paper and imputing the citations by a generic search engineFootnote 7 (i.e., Google Scholar). Doing so, other 13 papers have been added, after looking at the references of the 43 selected papers, leading to a final sample of 56 papers. Figure 2 depicts the PRISMA flow diagram.

Fig. 2
figure 2

Source: our elaboration

PRISMA flow diagram of the systematic literature review.

Although we have been careful in constructing our sample, we recognize that a limited number of studies may have been excluded because the search did not retrieve them or because checking by references and snowballing did not identify them. Nevertheless, we are reasonably confident that the included studies provide a complete and updated overview of the literature regarding the laboratory experimental approach in the field of health economics.

Basic summary of the sampled publications

In this Section, we provide some descriptive statistics of the sampled publications. First, we show a synoptic table which collects all the reviewed papers, differentiating them by the sample size and the subject pool selected. Additionally, we discuss the trend of the papers by year, publishing journal and area of interest.

Synoptic table

Table 1 lists the 56 papers of our sample and distinguishes them by the outlet, the topic, the sample size, whether the study was conducted over the Internet or resulted from a combination of experimental and non-experimental (i.e., extra-laboratory) methods, the employment of either medical students or physicians in the experiment and the number of citations. Numbers in parentheses in the last two columns indicate the specific number of that subject pool joining the experiment.

Table 1 Main features of the sample; Source: our elaboration on Scopus database

The average sample size of the selected papers is 210.36, ranging from a minimum of 23 to a maximum of 925 participants. Statistics sharply change when we take online experiments out of the calculation, with an average sample size of 191.83 (23–608), since studies conducted over the Internet allow for the recruitment of larger sample than those usually employed in the laboratory experiments. Fourteen studies have employed only nonmedical students. Physicians joined eleven experiments,Footnote 8 four of which were run online, and four of which have employed medical students too. Restricting our attention to physicians, on average 64.81 of them take part in the experiments ranging from a minimum of 4 to a maximum of 99. Again, figures significantly change when we exclude online experiments, with an average sample size of 44.57.

The small number of studies including physicians is not surprising. In fact, running experiments with real physicians is not an easy task. As shown by Rahman et al. [72], in the context of clinical trial, physicians’ unwillingness to join sessions is due to several participation barriers such as lack of time, lack of incentives and recognition, communication troubles, absence of any research experience and in some circumstances ‘a scientifically uninteresting research question’, which makes them not involved at all. The same authors suggest the adoption of financial rewards to encourage doctors’ participation. However, it must be considered that physicians’ opportunity cost is very high, especially when compared with students’ opportunity cost, which is traditionally quite low. A possible solution could be to reach physicians via email and let them join the experiment online. Doing so, it would certainly raise the proportion of physicians accepting to participate, scheduling the sessions in a way to avoid interference with physicians’ working schedule. Allowing physicians to complete the experiment where and whenever they want without interfering with their work schedule would certainly make them more willing to contribute to the research and reduce their convenience costs. Certainly, this solution comes at the cost of a partial loss of the experimental control, which may result in the so-called observer effect (due to the absence of the experimenter supervising the sessions). However, many studies find little evidence of difference between behavior in the lab and outside [3, 61, 82]. Thus, the observer effect does not appear to significantly affect the experimental results.

Finally, looking at the citation frequency, Hennig-Schmidt et al. [46] stands out above all with 144 citations. This result is due to two main reasons. First, it is the oldest paper included in our review. Second, and most importantly, it laid the foundation of the artificial environment (i.e., the design) used to study the decision-making of physicians in terms of services provided to patients, given different payment structures. Then, Hennig-Schmidt et al. [46]’s design has been replicated with different subject pools (e.g., [11]) and different payment systems (e.g., [59]). Additionally, several variants of the standard design have been introduced, in order to test competition [16], altruism [91] and so on (see “The role of payment schemes”).

Analysis of search resultsFootnote 9

Figure 3 shows the number of published papers by year.

Fig. 3
figure 3

Source: our elaboration on Scopus database

Documents by year.

The first paper was published in 2011, one paper in 2012, three papers each in 2013 and 2014, and then, one paper in 2015.Footnote 10 In 2016–2017, we have witnessed two peaks with eight papers published in 2016 and ten in 2017. As shown by Cox et al. [20,21,22], in recent years, the pros of experimental methods applied to the healthcare have emerged, driving many authors to employ the behavioral approach to investigate many issues in this field. Finally, from 2018 up to the present year, we have observed a volatile pattern maybe due to the exclusion of field experiments from our review. However, we expect a downward trend in the 2-year period 2021–2023 due to the outbreak of COVID-19 in 2019–2020, which prevented experimenters from running sessions because of national restrictions. Such drop has been partially compensated by the use of online experiments (e.g., [90, 92]).

Figure 4 reports the documents per year by source.

Fig. 4
figure 4

Source: our elaboration on Scopus database

Documents per year by source.

We restricted the attention to the top three journals in terms of number of published papers: Health Economics (12 papers); Journal of Economic Behavior and Organization (10 papers); Social Science & Medicine (3 papers) accounting for 45% of the sampled papers (25 over 56). Figure 4 mirrors Fig. 3 to some extent, with two peaks in the two-year period 2016–2017 in correspondence of Journal of Economic Behavior and Organization and Health Economics, certainly due to the launches of the special issues ‘Experimental and Behavioral Economics of HealthcareFootnote 11 and ‘Behavioural experiments in Health supplement’,Footnote 12 respectively. Finally, Fig. 5 differentiates documents by subject area.

Fig. 5
figure 5

Source: our elaboration on Scopus database

Documents by subject area.

As expected, 21% of the selected papers falls under the scope of Economics, Econometrics and Finance. 30% of the published studies belongs to Medicine, followed by Business, Management and Accounting (16%), Social Science (11%) and Arts and Humanities (6%). The breakdown by the remaining areas is almost equitable.

Review

In the following paragraphs, we will first focus on the papers investigating the role of payment schemes on physicians’ decisions, accounting for 55.3% of our sample, and then, on those works not falling into any specific research topic group.

The role of payment schemes

Measuring how physicians respond to payment schemes is the most common topic among experiments in health economics that look at the provision of healthcare services (i.e., supply side). In our systematic review, 31 articles out of 56 deal with physicians’ payment schemes. Table 2 summarizes the topic investigated and the main results of each paper.

Table 2 Studies in the sample that explore the role of payment schemes; Source: our elaboration

Most of those experiments have been run with students, whereas few of them have involved medical students (see for instance [20, 59, 71]) and physicians (see for instance [74, 86]).

Although not all papers assess payment schemes as the main objective of their research question, a relevant portion of works focuses on physicians’ behavior in medical service provision under different payment schemes. Hennig-Schmidt et al. [46] have been the first to test the theoretical predictions introduced by Ellis and Mcguire [29]’s seminal model showing that physicians’ treatment decisions are affected by payment systems. Thus, in Hennig-Schmidt et al. [46], participants, acting as physicians, choose the amount of services to provide to standard patients, varying on the severity of illness (i.e., low, medium, and high), under alternative payment schemes, capitation (CAP) and fee for service (FFS), respectively.Footnote 13 Brosig-Koch et al. [11] compare medical and nonmedical students’ behavior, showing that the former are less affected by financial incentives than the latter.Footnote 14 The experimental design of Hennig-Schmidt et al. [46] has been replicated by several other authors to test the impact of different monetary incentives, such as mixed systems [12,13,14], report cards [40], salary [59] and diagnosis-related-group (DRG) [66]. Similarly, 12,13,14 test whether subjects’ ex ante preferences for either CAP or FFS, elicited through the strategy method [81], can justify their ex post treatment decisions in the lab. Using the same payment schemes, Hennig-Schmidt and Wiesen [47] measure patient-regarding motivations among medical and nonmedical students, showing that medical students are more altruistic and prone to sacrifice their own profit compared to nonmedical ones. Still focusing on CAP and FFS, Keser et al. [54] make participants progressively face a reduction in the lump sum payment (i.e., CAP) to test whether physicians react by customizing care at the individual patient level and whether this result is also observable under FFS. Inspired by 12,13,14, Zhang et al. [91] demonstrate how the shift from pure payment systems (either FFS or DRG) to mixed ones, with DRG and FFS components in different weights, increases patients’ benefits (confirming [66]. The average value of the altruistic parameter found by Zhang et al. [91] approaches that found in 12,13,14 , but it is still lower than that obtained in Zhang et al. [92]. Using a combination of surveys, discrete-choice experiments and an online experiment,Footnote 15 Zhang et al. [92] demonstrate that prospective physicians with low altruistic motives would opt for high-income specialties and would be less prone to accept job in rural areas, confirming a wide range of literature [6, 59, 63]. Patient-regarding preferences are, instead, analyzed in Wang et al. [86] comparing medical students and real physicians, to show that such preferences are not significantly different across the subject pools, differently from what found by Hennig-Schmidt and Wiesen [47] and Kairies-Schwarz and Souček [53].

Additionally, also drawing from Hennig-Schmidt et al. [46]’s model, Finocchiaro Castro et al. [30] introduce a random probability for a physician to be sued for malpractice to test its effect on medical service provision, whereas Martinsson and Persson [68] propose a patient health benefit function to show how physicians’ altruism varies on patients’ medical needs. Under the same design, Godager et al. [39] prove that disclosing information on providers’ performances to their peers benefits the quality of care under FFS.

Departing from Hennig-Schmidt [46], Di Guida et al. [27] investigate how physicians under FFS allocate services to patients with different responsiveness to treatments throughout 36 working days, highlighting that resource constraints might be a deterrent to overprovision. The influence of resource limitations on physicians’ patient prioritization is confirmed by Oxholm et al. [70] in their laboratory experiment, where medical students have been incentivized by CAP, differentiated CAP (i.e., the fixed amount vary with the patients’ needs), and salary. Similarly, Oxholm et al. [71], distinguishing patients by treatment responsiveness, demonstrate how redistribution of services is stricter under pay-for-performance when resource constraints are at play.Footnote 16 Resorting to the same payment system with bonus-malus incentives in contrast to a simple DRG system, Kairies-Schwarz and Souček [53] find that the former improves the quality of care depending on the fee size of DRG as well as on physicians’ initial orientation towards the patient. Coming back to CAP and FFS, Lee et al. [62] show how doctors’ number of prescriptions are affected by monetary incentives rather than by patients’ severity of illness.

Differently from previous works, Reif et al. [74] account for the presence of an insurer who must budget for physicians’ cost of providing services to patients whose status of health can be misreported. Dishonesty is also investigated by Hennig-Schmidt et al. [48] who underline the need to introduce audit probability to avoid fraudulent behavior in reporting information (i.e., obstetricians reporting birth weights), which determine reimbursement rates.Footnote 17 Referral rates are instead the focus of Waibel and Wiesen [85] who show that when referral fees are increased, the number of referrals raises regardless of the patient type. The introduction of bonus payment for information provision while referring the patient to the specialist is the purpose of Brosig-Koch et al. [15]. Subjects playing the role of primary care physicians (PCP) decide whether or not to pass low/high-quality information, while the specialist automatically provides the optimal treatment to the patient. Different experimental conditions are tested: change in the beneficiary of information (specialist vs patient), change in the relative payoff of the PCP and the specialist, change in the bonus and in the additional capitation payment. Supporting the authors’ theoretical model, data show that introducing the bonus payment incentive increases PCPs’ likelihood to provide information. Differently, payment systems variations are used to assess hospital readmission rates by Cox et al. [20], demonstrating that pay-for-performance incentives together with decision support system drive to more cost-effective discharge decisions. The effectiveness of pay-for-performance is also confirmed in the laboratory experiment conducted by Bardey et al. [7], assessing the impact of monetary incentives on the use of personalized medicine. Irvine et al. [51] discuss how participants react to computer-based patients’ non-adherence to medical prescriptions, under two different payment conditions. Under the first payment condition (named the individual incentive) physicians are paid whether the patient conforms to the treatment recommendation, under the second one, physician receives a salary which is independent from patient’s outcome. Differently from the above-mentioned topics, two pharmaceutical pricing options are investigated in the online experiment by Wettstein and Boes [90], a cost–benefit measure and an outcome based one, respectively. Participants must buy or sellFootnote 18 a closed envelope containing a donation to the patient association of an unknown amount, or eventually opt for an alternative with known price and patient’s benefit. Subjects are given a salary for completing the task. Depending on the treatment, in addition to reaching an agreement on the price, participants are required to either agree on the resulting patient’s benefit (otherwise the price of the concluded negotiation is cut in half) (cost-sharing treatment), or to estimate the donation contained in the sealed envelope (risk-sharing treatment). The outcome of the negotiations both in terms of patient’s benefit and offer prices are significantly affected by the existence of available alternatives. Wettstein and Boes [90] draw their experimental design from Wettstein and Boes [88] who, rather than evaluating the impact of value-based interventions, focus on just the impact of negotiation for life-extending drugs on societal outcome. Here, in addition to salary [90], participants may also receive a bonus which depends on their preferences, the offer and the counteroffer prices. Furthermore, participants are divided into two groups according to different magnitude price framings: the 100 k$ group with fictive real-world prices and the 1$ group with real payoff prices. Results, which are then confirmed in the follow-up study [89],Footnote 19 show that offer prices and successful negotiations depend on the price magnitude framing.

Finally, moving to another common topic, financial incentives are used as a tool to investigate how competition between providers affects physicians’ provision behavior under CAP and FFS [12,13,14]. Similarly, Brosig-Koch et al. [16] test how physicians, incentivized by FFS, respond to competition, facing an heterogenous patient population. Here, patients differ on health status (high/low severity), which represents the novelty with respect to Brosig-Koch et al. [12,13,14], and responsiveness to the quality of services provided.Footnote 20 Data show that patients’ reactivity to treatment crucially determines the effect of competition among clinicians.

Table 3 indicates whether for each of the study a theoretical model is reported and whether its predictions are fully or partially confirmed by the experimental results. What clearly emerges from the use of experimental methods in the context of financial incentives in health is that providers are not uniquely driven by monetary rewards. The theoretical model provided by Ellis and Mcguire [29] suggests that remunerating physicians through prospective payments system (e.g., FFS) would lead to the overprovision of health services, while the reverse would take place under cost-shared schemes (e.g., CAP). Although the above-mentioned predictions are confirmed in the experimental context, they attenuate in the presence of additional factors which cannot be accounted for in a formal paradigm. First, it must be noticed that incentive schemes are not uniformly perceived by physicians, but depend on their working experience and on their degree of altruism [47, 86]. Additionally, reactions to incentives are not so clear-cut when providers have the possibility of referring the patients to specialists or when they are informed about their peers’ performances or when they face budget and resource constraints [39, 70, 85]. Furthermore, providers take their decisions under given economic incentives also taking into account patient’s severity of illness and his reactivity to treatment [16, 27, 71]. The above-mentioned aspects are difficult to include in a single theoretical model. This explains how the resort to experimental methods allows not only to test pre-existing theoretical predictions but also to derive insights to inform policy decisions.

Table 3 Consistency between theoretical predictions and experimental results for studies on payment schemes; Source: our elaboration

Other topics in the provision of health services

There are several papers in our pool which cannot be inserted into a specific group, facing a variety of health topics such as resource allocation, health insurance decisions, competition and so on. Table 4 summarizes the specific topic investigated and the main results of each paper.

Table 4 Other topics in the provision of health services explored in the sample; Source: our elaboration

For instance, Ahlert et al. [1] ask economics and medicine students to allocate a given amount to seven potential recipients varying in the quantity needed to obtain a positive payoff, either in a neutral or in a medicalFootnote 21 framework. Results show that economists are significantly affected by the experimental setting, mimicking more often payoff-maximizers’ behaviors in the neutral framing than in the medical one. Economics, law and medical students face similar tasks in the questionnaire experiment by Ahlert et al. [2].Footnote 22 Here, a significant difference between medical and economics students in allocation decisions is clearly observed, with law students making choices close to those of their medical colleagues. Brendel et al. [9] check how resource scarcity impacts on medical service provision. Medical and nonmedical students in the role of physicians decide how many services to provide to patients with varying characteristics, under different budget constraints. Results reveal that patients’ health benefits decrease in response to more severe budget limitations, receiving fewer services.

To address the role of altruistic preferencesFootnote 23 in medical decisions, Kolstad and Lindkvist [58] combine the results of a dictator game and medical students and nurses’ responses to a questionnaire to investigate whether their social preferences affect their willingness to work in the public or private sector in Tanzania. Results demonstrate that medical students preferring to work in the public sector show more pronounced pro-social preferences than those opting for the private sector (see [92]). In the same setting, Brock et al. [10], merging a laboratory experiment and data from the field,Footnote 24 measure clinicians’ generosity through a dictator game where the clinician takes the role of the dictator and the participant from the standard subjects’ pool stands for the receiver. Data show that the majority of physicians equally divide the allocation between themselves and the other person. Similarly, Kesternich et al. [56] investigate how medical students trade their own profit, the patient’s benefit, and the third party’s payment for medical treatment, in the context of professional norms. After being endowed with a different version of the Hippocratic Oath, participants play eight standard dictator games and four cost dispersion games.Footnote 25 Treatments vary on the salience of professional norm, the framing (neutral vs medical), and the identity of the receiver (a student vs a real charity). The introduction of the Hippocratic Oath is found to increase participants’ altruistic motivations. A graphical versionFootnote 26 of the dictator game is used in the web-based experiment by Li et al. [65] to investigate physicians’ altruism and equality-efficiency orientation.Footnote 27 US practitioners from different specialties are asked to distribute real money between themselves and an anonymous party. Additionally, participants face the cost of giving to the other side, which varies across the allocation decisions. Results are compared with data from previous experiments and show that physicians are more altruistic than both the sample population and a cohort of medical students, but less efficiency-focused than medical students. The above-described methodology was already adopted in Li et al. [63] to study the social preferences of first to fourth-year medical students from US (see also [64]Footnote 28). Data show that medical students are significantly less altruistic and more efficiency-oriented than the average American population. Moreover, by comparing students from top-ranked universities with students from low-ranked universities, the former are less altruistic than the latter and exhibit social preferences like a pool of elite law students. Differently, in Attema et al. [6], combining lab and online experimental sessions, German medical students with different seniority decide between two treatment alternatives for 30 stylized patients, where the two choice options represent the trade-off between patient’s benefit and physician’s profit.Footnote 29 Generally, students tend to be patient-oriented in their decisions, although their altruism declines throughout the seniority. Patient-regarding behavior significantly differs between medical and nonmedical students, with the former being more altruistic than the latter. Finally, prospective physicians with higher-income expectations put less weight on patient’s benefits, with respect to their own profit (confirming [92]).

Physicians’ patient-regarding preferences with a specific focus on cost-sharing are discussed in Ge et al. [36], combining methods from discrete-choice (i.e., to design choice menus) and health economics experiments. Medical students make 23 treatment choices based on two alternatives for a hypothetical patient who has to pay the out-of-pocket fee ‘required’ for the treatment received. The two options differ in terms of physician’s profit, patient’ health benefit and patient’s consumption opportunities after cost-sharing. In this way, participants’ decisions determine the co-payment and the money available to the patient after treatment, which is the difference between the initial endowment and the co-payment. Data demonstrate that medical students care about both patients’ health benefit and consumption opportunities, although the former prevails on the latter in driving treatment decisions.

Moving to health insurance, Huck et al. [50] investigates the effects of both insurance and competition on the interaction between patients and physicians. Patients, who pay the whole cost of the treatment or share the cost with all the other patients in the insurance condition, can choose whether to consult a physician and eventually which physician to refer to in the competition condition. The physicians, instead, choose the treatment to provide. Under the insurance condition, patients consult the physician more frequently, whereas physicians are more likely to overtreat the patients. The last result is mitigated when competition is introduced. The effects of market competition on medical treatments are assessed in Ge and Godager [35]. Participants acting as physicians select the medical services to provide under three different market conditions: monopoly, duopoly and quadropoly. Results show that participants are more patient-oriented in their decisions when competition is higher. The outcomes of a hypothetical merger among competing hospitals are discussed in Han et al. [43]. Participants in the role of a hospital head decide on the quality of services to provide to patients before and after eventually experiencing a merger. Participants’ selections reveal that quality does not benefit from merging. Close to competition issue, Mimra et al. [69] address the role of second consultations in a lab experiment where participants are randomly assigned the role of physicians or patients. The former decides whether to overtreat a patient, the latter can eventually ask for a second consultation at a high or low cost depending on the treatment. Overtreatment is mitigated under the second consultation condition. When search costs are reduced, patients overuse second opinions.

Martin‑Lapoirie [67] check how teamwork among healthcare providers affects the individual precaution behavior under different liability scenarios. Subjects playing as healthcare professionals select the effort level for each consultation, while dummy patients decide whether to refer twice to the same physician or to consult two different physicians. Results show that strict liability and the negligence ruleFootnote 30 lead to similar precaution behaviors. In their laboratory experiment, Angerer et al. [4] investigate how introducing the possibility for physician of being monitored either randomly or upon the patient’s request can avoid misbehaviors such as undertreatment, overtreatment, and overcharging. Data show that both endogenous monitoring and exogenous monitoring succeed in reducing the level of undertreatment and overcharging observed and improve market efficiency. To improve decisions, Cox et al. [21] investigate how introducing clinical decision support system (CDSS) affects physicians and fourth-year medical students’ hospital discharge decisions. Recommendations provided by CDSS contain patients’ probability of readmission in case of an incorrect early discharge decision, which is costly to the provider. Results provide evidence for CDSS as an effective tool to improve discharge decisions.

Prescription behavior is the focus of Greiner et al. [42], who test the possibility of separating prescription and treatment activities through a lab experiment. In the baseline condition, the physician decides the prices for possible treatments, while the patient decides whether to consult the doctor and whether to undertake the suggested treatment. Under a different experimental condition, the patient interacts with two different doctors: the first one is only in charge of the prescription phase (for free), the second one only implements the treatment previously prescribed. Although this second condition results in a reduction of overtreatment, it reduces efficiency due to miscoordination between the doctors involved.

Cao and Liu [17] study how concurrent tasks impact on diagnostic decisions. Participants play three single task conditions and two dual task conditions. Task conditions include: a visual task (abstract diagnostic decision-making task), and two auditory tasks (a sound monitoring and a memorization task). In each task, participants, after eventually asking for additional diagnostic tests, are asked to indicate the disease which the hypothetical patient suffers from. Diagnostic performance is worsened in the presence of simultaneous tasks. The effect of information overload on clinical decision-making is addressed in Laker et al. [60]. Real physicians after looking at a fictitious medical scenario are asked to report the preferred care plan to the hypothetical patient. In the experimental condition, physicians can benefit from emphasis frame, which is the marking of salient components of the information provided on patient’s medical scenario, to minimize the effect of information overload.

Organ donation is addressed in Kessler and Roth [55] and Herr and Normann [49]. In the former, college students play a game where they have to opportunity to register as organ donors, although instructions are neutrally framed, under different allocation rules. Results demonstrate that the presence of a loophole, where subjects can register to get a priority but simultaneously refuse donating organs, has a detrimental effect on the donation resulted by the priority rule. In the latter experiment, medical and nonmedical students first join several rounds of a donation game and, after having already tested it, they are asked to vote for the implementation of a priority rule in the last rounds of the game. Two-thirds of the participants show stronger preferences for the priority rule.

Finally, the last papers included focus on participants’ risk and time preferences measurement. As reported in the literature, risk preferences are domain-dependent (see e.g., [87], Weber et al. 2002), and then, several authors prefer measuring risk across different contexts before drawing conclusions. For instance, in their laboratory experiment, Arrieta et al. [5] measure medical and nonmedical students’ risk preferences in deciding for others both in the monetary and health domain, using the Holt and Laury (2002) (HL)’s multiple price list method. Participants playing the role of a physician who takes decisions in three different health contexts, must choose the treatments to provide to patients. Depending on the context, health gains can be expressed in terms of years of life for a patient with varying health conditions or hours of pain alleviated. Results confirm that risk tendencies are health-context specific. Additionally, students with a medical background are found to be more risk-averse than their peers and surprisingly such attitude is exacerbated in the health domain. Similarly, Rapis et al. (2017) combine simulated vignettes, surveys, and behavioral experiments to study the association between clinicians’ risk preferences and therapeutic prescriptions in atrial fibrillation. In the experiment, physicians are asked to select either a visual option with known probabilities of the outcomes or an alternative option with unknown probability of the same outcomes, with a gray bar indicating the degree of uncertainty of the winning probability in the second option. Second, physicians are asked to make a similar choice in the health context, with a gray bar indicating the degree of uncertainty of the survival probability. Data show that physicians are more willing to select ambiguous options in the health domain than in the financial domain. The above-mentioned results differ from the ones of Saposnik et al. [80].Footnote 31 Using the same combination of methods of Rapis et al. (2017) and adding risk aversion measurement, Saposnik et al. [80] find that neurologists are more reluctant to choose ambiguity options in the health domain. Finally, high aversion to uncertainty leads to treatment inertia in the management of multiple sclerosis.Footnote 32

Concluding remarks

Our study provides a systematic review of the literature applying behavioral and experimental methods to health issues related to different perspectives in the provision of health services. This has not been an easy task. Many studies have been incorrectly classified as ‘experiments in behavioral health’, although their designs are not incentive-compatible and do not provide real consequences for participants [37]. Thus, of the 1248 articles retrieved, published between January 2011 and July 2023, only 56 articles have met our inclusion criteria. Specifically, we have focused only on laboratory and online (hypothetical lab) experiments, excluding field experiments, discrete-choice experiments, control random trials, and quasi-experiments based on hypothetical choices or stated preferences due to the absence of any monetary incentive.

The selected papers have been first classified according to the object of analysis. A large portion of the 56 papers investigate the issue of payment schemes, whereas the remaining studies focus on several different themes such as health insurance, organ donation and market competition, making it impossible to group them into specific categories. Then, for each paper, we have checked the number of participants and their type (student, medical students, or physicians) describing the experimental designs and main results.

The main aspect emerging from our systematic review on the provision of health services in the lab is the need to involve more physicians in health-related experiments, in order to increase the external validity of the results. Although we are fully aware of the difficulty in gathering physicians in a lab due to their high opportunity cost, their awareness of medical procedure and their experience can make experimental results much more sound and able to provide robust health policy implications. Online experiments can be useful in mitigating such issue, allowing to involve larger sample of physicians reducing their opportunity cost but at the expense of a partial loss of the experimental control. We acknowledge that some researchers argue that physicians’ participation typically concerns field experiments more than laboratory ones. However, some experimental papers show that choosing medical students, or even nonmedical students, to act as physicians in health-related decisions concerning patient treatment may affect the external validity of the results [1, 2, 5, 6, 47]. In this regard, both Brosig-Koch et al. [11] and Finocchiaro Castro et al. [31] show that subjects’ answers to incentivized choices vary on their background and that physicians more easily grasp the main incentives in the experimental designs. The authors conclude that experimenters need to carefully select their pools before testing any health economics prediction.

Another aspect raised by our systematic review is the poor connection between two fields of research: behavioral and experimental economics on the one side and health economics on the other side [37]. Such gap is confirmed by the lack of incentive compatibility typical of many discrete-choice and quasi-experiments. As suggested by Gibson [37], some of the experiments carried out in specific areas (i.e., decisions about health-related behaviors such as smoking, diet, and alcohol drinking) can be improved with the introduction of behavioral consequences for the participants’ stated preferences, providing the appropriate incentive compatible scheme for each area to be investigated. In fact, the variety of health-related topics so far addressed provides evidence of the many advantages of the use of experimental methodology. First, laboratory experiments are replicable, which means that multiple sessions can be run in different times and contexts, also allowing the recruitment of different subjects’ categories (e.g., in terms of age, gender, work experience, specialties and so on). Second, using dummy players (i.e., computerized) or assigning real participants to different roles, permits to simulate real-world interactions (e.g., patient vs doctor, PCP vs specialist), which are generally harder to observe in the field. Finally, multiple stages experiments allow to address several topics at once (e.g., altruism and competition), exploiting responses from the same subjects.

As discussed by Hansen et al. [44], when studying the decision-making of doctors or medical students, a combination of non-experimental methods (e.g., surveys, questionnaires) and economic experiments should be preferred. In fact, surveys and focus groups may represent informative preliminary steps for the experimenters when they need to know more about how doctors make decisions and their decision-making environment before building the experimental designs. Merging different methodologies may overcome the lack of connection between experimental economics and health economics.

Hence, in this systematic review, we have attempted to offer a comprehensive review of a strand of literature dealing with issues related to the provision of healthcare services. This is an area that has significantly grown in the last 10 years, and, to the best of our knowledge, it has not yet been properly reviewed. Although our work shows that the role of incentives related to payment systems is the most investigated strand, there is still much to be done. For example, it is still poorly understood how in P4P systems physician’s behavior is influenced by base payment (FFS or CAP), how patient’s characteristics influence prioritization decisions, and which payment system design features could potentially influence treatment decisions and improve the quality of care for different types of patients.

Additionally, although many areas of research have been explored using laboratory experiments, other areas remain still untreated. For example, to the best of our knowledge, no study has investigated waiting lists from the perspective of healthcare providers yet, though the subject has been widely treated in the health economics literature. Another promising and yet little explored area of research concerns the behavior of providers when there are peaks in demand or under extreme conditions such as pandemic situations.

Finally, some limits of our systematic review are worth mentioning. First, the literature selection process might be limited by the exclusion of some relevant articles which are not contained into SCOPUS database. Additionally, we might have missed other studies due to our keyword selection or to the restricted time span. Furthermore, despite having used all the precautions specific to the systematic review approach to allow for replicability, a certain degree of discretion cannot be neglected. Excluding field experiments as well as experiments on health-related behavior is a critical decision for our systematic review. Consequently, despite transparently explaining the reasons behind our choices, we are conscious that other researchers may have opted for different solutions.