1 Introduction

Economic theory suggests that tourism demand is influenced by different price (e.g., product price, prices of other products) and non-price factors (e.g., income, size of the market, advertising and promotion, tastes, among others) (see Song and Witt 2000, and Dwyer et al. 2010, for an overview over tourism demand modelling). Therefore, estimated tourism demand elasticities (e.g., own-price, cross-prices, income, marketing, among others) are useful not only for tourism managers to conduct an insightful demand sensitivity analysis based on prices and income, but also for short- and long-term policy recommendations.

To improve the understanding of short- and long-run properties of tourism data and to provide a framework in which hypotheses regarding growth and fluctuations in tourism can be tested, cointegration analysis has been frequently used to assess the degree of interdependence of tourism markets and tourism demand and its determinants in the long-term (e.g., Dritsakis 2004; Narayan 2004; Bonham et al. 2006; or Massidda and Mattana 2013; among others).

Particularly, the tourism demand analysis distinguishes between the demand for travel to a destination (e.g., visitor arrivals or expenditure) and the demand for related products or services (e.g., hotels, peer-to-peer accommodation) using time series analysis. In the first case, the empirical literature on the tourism industry has used cointegration methods to analyse the long-run equilibrium between tourism demand using tourism arrivals and several determinants such as the relative price of destination compared to the country of origin of tourists, the relative price for substitute destinations, and real income, among others (e.g., Webber 2001; Lim and McAleer 2001; Li et al. 2004, 2005; Dritsakis 2004). In addition, there are papers using panel data approaches such as Seetanah et al. (2010), Çalışkan et al. (2019) or Dogru et al. (2021). In general, results have shown that demand is cointegrated with its classic determinants both in the time series and panel data framework (see the literature review section below for an overview). In the second case, there are also studies focusing on the analysis of the long-run equilibrium link between tourism demand for products and services and their explanatory factors such for hotel demand (e.g., Lee 2010; Bianchi and Chen 2020; Lee and How 2019, 2022a, 2022b; among others) or restaurants (e.g., Herrington and Bosworth 2016), using time series models. However, there are no studies of tourism demand in a panel data context investigating the long-term relationship in both hotel or P2P accommodation markets, except for Gómez et al. (2021)’s analysis of the relationship between Airbnb and hotels in Mexico.

Panel data are a useful approach to investigate observations for different individuals at different moments in time (Baltagi 2001). The panel data framework allows us to consider a cross-section of listings along time which accounts for unobserved time-invariant heterogeneity. Therefore, this paper’s main contribution to the empirical literature is the analysis of the long-run equilibrium relationship between P2P tourism demand and its determinants at the listing level; that is using microeconomic information for active listings in the P2P accommodation market in a panel data framework which allows us to analyse the economic behaviour of cross-section of listings over time.

More specifically, we consider the heterogeneity of listings' demand in a panel data context. Demand of listings in P2P accommodation markets has been studied by Gunter et al. (2020), Jiménez et al. (2023), and Suárez-Vega et al. (2023). These papers have studied the relevance of elasticities of demand, cross-prices and income using spatial panel data methods. They have reported that the classical accommodation market (hotels) and peer-to-peer (P2P) accommodation market such as Airbnb are substitute goods (Günter et al. 2020; Jiménez et al. 2023), and therefore, competitors. Hence, if they increase prices disproportionately, the business might lose sales and revenues as customers transfer to local competitors, including neighbours’ listings (Gunter et al. 2020). However, they do not account for unobserved dependence between cross-sectional units (e.g., between neighbour properties and, therefore, between their demands). Therefore, we use two heterogeneous panel data cointegration models based on cross-sectional dependence: first, the standard panel cointegration model with a multi-factor error structure, as proposed by Pesaran (2006), based on the common correlated effects (CCE) estimator (which is robust to error cross-section dependence of different types, possible slope heterogeneity, and unit roots in factors), and second, the fractionally integrated heterogeneous panel data model recently proposed by Ergemen and Velasco (2017), which permits contemporaneous correlation of factors and innovations.

Our focus lies on the fractional cointegration panel data framework. In particular, the existence of fractional cointegration could indicate that the long-run equilibrium recovers slowly after deviations from it. This could have important policy implications, insofar as the transition speed to the long-run equilibrium might require the implementation of different policies. Therefore, listings’ managers and tourism policymakers would have to take this slow (rather than fast) transition into account to define demand-driven adjustments to changes in prices and income in formulating their demand policies.

The database used is a balanced panel from January 2016 to September 2021 (57 months) for Airbnb listings in the Canary Islands (Spain). The data was provided by AirDNA. We use, as previously mentioned, both standard panel cointegration and a fractionally integrated heterogeneous panel data model.

The rest of the paper is organised as follows. Section 2 defines the theoretical microeconomic background. Section 3 reviews the empirical literature on panel data demand models and cointegration and describes the P2P market in the Canary Islands. Section 4 focuses on the fractional cointegration panel data methodology. In Sect. 5, we describe the variables in the database and main results. Finally, in Sect. 6 we conclude.

2 Theoretical background

The analysis of the preferences underlying the tourism demand at the listing level is based on the microeconomic theory of Marshallian demand assuming that tourism is an ordinary and a normal good (Gunter et al. 2020). Following Bonham et al. (2009), empirical models of tourism demand borrow heavily from consumer theory which predicts that the optimal consumption level depends on the consumer’s income, the price of the tourism good, the prices of related goods (substitutes and complements), and other demand shifters.

A model of tourism demand at the listing level to be estimated and tested can be written as:

$$ D_{i} = f\left( {Y_{i} ,P_{i} ,P_{ik}^{l} ,P_{i}^{h} ,Z} \right), $$
(1)

where \(D_{i}\) is the tourism demand of the i-th listing; \(Y_{i}\) is the income of the visitors at the i-th listing; \(P_{i}\) denotes the price of i-th listing; \(P_{ik}^{l}\) is the price of others listings near i-th listing at the k-th radius;\(P_{i}^{h}\) is the price of the tourism product in the competing accommodation market for the i-th listing, for example, hotels (h); and Z are other tourism demand factors which can influence on tourism demand (e.g., tastes, marketing, among others).

Marshallian demand expresses demand as the quantity vector which maximises an individual customer’s utility subject to their income, as well as the prevailing price vector. In this context, there is an inverse relationship between own price and demand (ordinary good), income has a positive effect on the demand and there is a substitution effect (that is tourism is a normal good). The algebraic signs for own-price, cross-prices and income elasticities are negative, positive and positive, respectively.

Estimation of price elasticity of demand is important for tourism managers seeking to maximize sale revenues. Knowledge of the income elasticity can help managers identify potential markets for their products. Cross-price elasticity can help tourism managers determine if their products have substitutes, complements or are independent. This information is essential for formulating pricing strategy and analysis of risk associated with various products.

Demand is under homogeneity a function of real income and relative destination and substitute prices,

$$ D_{ij} = f\left( {{{Y_{i} } \mathord{\left/ {\vphantom {{Y_{i} } {P_{i} }}} \right. \kern-0pt} {P_{i} }},{{P_{ik}^{l} } \mathord{\left/ {\vphantom {{P_{ik}^{l} } {P_{i} }}} \right. \kern-0pt} {P_{i} }},{{P_{i}^{h} } \mathord{\left/ {\vphantom {{P_{i}^{h} } {P_{i} }}} \right. \kern-0pt} {P_{i} }},Z} \right), $$

and will be estimated in the empirical demand modelling.

3 Literature review

3.1 Demand models in a panel data context

There is a large empirical literature on tourism demand for several countries and regions using both static and dynamic panel data models, in both standard and spatial contexts.

Using panel data, several authors have analysed the economic determinants of travel demand. For example, Sakai et al. (2000) considered the effects of Japanese demographic change, while Garín-Muñoz and Amaral (2000) considered the number of tourists, income per capita, exchange rate and real prices for Spain. For their part, Ledesma-Rodríguez et al. (2001) analysed demand for the Spanish island of Tenerife, Naudé and Saayman (2005) for tourist arrivals in 43 African countries, while Roget and González (2006) investigated rural tourism demand in Galicia (Spain). In general, the factors influencing demand that have been analysed include prices of tourism products at the destination relative to the origin (relative prices), the prices of tourism products in competing destinations (substitute prices), tourist’s income levels, and exchange rates (Li et al. 2005; Song and Li 2008). Other important determinants include marketing policies Ledesma-Rodríguez et al. (2001), attacks on tourists (Bonham et al. 2006), financial crises (Song et al. 2011), climate change (Moore 2010), and political stability (Saha and Yap 2014).

Several papers modelling tourism demand are based on dynamic panel data models with lagged demand as an explanatory variable (e.g., Garín-Muñoz 2006; Garín-Muñoz and Montero-Martín, 2007). The main intuition of dynamics in tourism demand is the occurrence of repeat visits, word-of-mouth recommendations, time lags in implementing a decision, information asymmetry, supply rigidities, and long-term adjustments (see Morley 2009; Song et al. 2012). Tourism demand is most commonly represented by tourism arrivals (e.g., Garín-Muñoz 2006; Naudé and Saayman 2005; Seetaram 2010), but also by tourism expenditure (e.g., Li et al. 2004; Lyssiotou 2000; Wu et al. 2012) and occupancy rates (Jiménez et al. 2022). These have obtained mixed estimated coefficients of the lagged dependent variable: Some are highly significant and positive (Song and Witt 2003), suggesting a strong positive willingness to return and/or a high word-of-mouth effect. However, others are negative (Naudé and Saayman 2005), suggesting a negative willingness. Further studies have even indicated that the lagged dependent variable is the principal tourism demand determinants (Song et al. 2010; Song and Witt 2003).

Only Jiménez et al. (2022) have studied the habit persistence effect in the P2P accommodation sector. They use a dynamic panel data model at the Spanish city level for 2014–2017 to analyse the effects of Airbnb on the size of local tourism markets and find a positive habit persistence effect for both Airbnb occupancy rates and for hotel overnight stays.

Finally, tourism demand research has also been conducted with respect to the spatial dimension, considering the relationships between distinct units due to their locations and the tourism demand determinants of units of interest, such as destinations, regions, or lodging units, over time. This literature is based on spatial spillover effects (i.e., a higher mutual influence of units closer together compared to those farther apart).

Papers considering the spatial dimension have studied the effect of several determinants, such as inbound tourism, prices, and supply, among others. Inbound tourism has been analysed for Australia (Deng and Athanasopoulos 2011) and for China (e.g., Zhang 2009; Yang and Wong 2012; Ma et al. 2015; Yang and Zhang 2019). With respect to online P2P lodging markets such as Airbnb, price determinants have been studied for the United Kingdom (Voltes-Dorta and Inchausti-Sintes 2021), Estonia (Önder et al. 2019) and Spain (Gutiérrez et al. 2017; Adamiak et al. 2019; Eugenio-Martín et al. 2019; Boto-García et al. 2021). Gunter et al. (2020) is the first to analyse Airbnb demand in a spatial econometric model with occupancy rates as dependent variable. Among other results, these authors have found that Airbnb demand in New York City is price inelastic and that the city’s traditional lodging businesses and nearby Airbnb listings are substitutes for the studied Airbnb listings. More recently, Suárez-Vega et al. (2023) proposed a dynamic spatial panel data model to estimate direct, indirect, and total marginal effects of several variables on tourism demand.

3.2 Demand models and cointegration

Tourism demand studies have also used time series econometric models to investigate cointegration between tourism demand and its determinants (see Song et al., 2019, for an overview). Such studies analyse tourism demand by testing for cointegration and estimating the cointegrating vectors, via error correction models (ECM).

3.2.1 Time series models and cointegration

Lathiras and Sriopoulos (1998) identified economic factors that were interrelated in the long-run with tourism demand and identified a statistically adequate short-run dynamic specification with appropriate forecasting properties. Kim and Song (1998) used ECMs to study the long- and short-term inbound tourism demand in South Korea with respect to Japan, US, UK, and Germany, as the four main tourist origin countries. Salman (2003) used the Engle-Granger cointegration approach to estimate the long-run relationship between monthly tourist flows to Sweden and income, price, and exchange rate. Kadir and Karim (2009) analysed cointegration and used error correction models to estimate a tourism demand model for Malaysia with respect to US and UK tourists. They obtained a long-run relationship between tourist arrivals and several explanatory factors and have found that US and UK tourist arrivals in Malaysia are significantly affected by income and the relative price of tourism.

More recently, Husein and Kara (2020) estimated the long-run tourism demand for Puerto Rico from the USA for 1970–2016. They accounted for the asymmetric impact of income changes on tourism demand, because of potentially asymmetric income elasticity over business cycles. In particular, they explored asymmetric cointegration with the nonlinear autoregressive distributed lag (ARDL) model of Shin et al. (2014).

Several papers used Johansen’s maximum likelihood method to test for cointegration and to estimate the cointegrating vectors within a vector autoregressive (VAR) model (e.g., Webber 2001, Lim and McAleer 2001, Dritsakis 2004, Bonham et al. 2006, 2009, or Massidda and Mattana 2013). Dritsakis (2004), for example, investigated changes in the long-term demand for tourism of Germans and Britons to Greece. He explained the demand for tourism with tourism prices in Greece, income in origin countries (Germany and Great Britain), and transportation cost and exchange rates between the three countries. Bonham et al. (2006) used vector error correction models (VECM) to make dynamic visitor predictions, in their specific case to determine whether tourism in Hawaii had completely recovered from 9/11 and other tragic international events. Their paper also considered policy alternatives to facilitate the recovery of international tourism to the US. Bonham et al. (2009) also estimated a fully identified VECM of Hawaii tourism, with relevant demand and supply-side effects. They identified reasonable long-run equilibrium relationships and demonstrated satisfactory forecasting performance using Diebold–Mariano forecast-accuracy tests.

Massidda and Mattana (2013) used a structural VECM for the Italian economy to investigate the contemporaneous and short- and long-run relationships across per capita international tourism arrivals, real GDP, and total international commercial transactions. They found bidirectional causality between real GDP and tourism arrivals and unidirectional long-run causality running from real GDP to total international commercial transactions and from total international commercial transactions to arrivals. From a structural model and the impulse-response functions of “meaningful” shocks they obtained valuable information for the design of policies and business strategies.

On the other hand, some papers analysed the long-term relationship between demand for different tourism related products. Lee (2010), for example, studied interactions between hotel room rates and the number of international inbound tourists for Singapore. He tested for the existence of any cointegrating relationship between them using the bounds testing approach to cointegration and found no evidence that Singapore’s hotel room rates and international inbound tourists are cointegrated. Also, Bianchi and Chen (2020) used a vector error correction model of hotel demand to incorporate both the short-run demand fluctuations and the long-run tourism growth. They found no evidence of a long-run market equilibrium between all three endogenous variables in the model, namely hotel nights, Swiss real gross domestic product, and real exchange rate of the Swiss franc. Also, Lee and How (2022a, b) investigated the influences of Singapore Economic Policy Uncertainty (EPU) on hotel room rates of two types of hotels, luxury and economy, in Singapore using the cointegration technique and error correction model. Their findings showed that the national EPU had a negative impact on economy hotel room rates, but no impact on luxury hotel room rates; and the global financial crisis had a negative effect on luxury hotel room rates but had no effect on economy hotel room rates. Finally, Herrington and Bosworth (2021) studied the demand for restaurants. In particular, they tested for cointegration between average unit sales and advertising and found that of the 184 individual restaurant chains examined, 135 were co-integrated.

3.2.2 Panel data models and cointegration

Regarding the panel data approach, Narayan (2004) used cointegration techniques and error correction models to examine the short- and long-term relationship between tourist arrivals in Fiji, real disposable incomes and relative hotel and substitute prices for the period 1970–2000. The paper used a cointegration method – the bounds testing methodology developed within the autoregressive distributed lag (ARDL) model (Pesaran et al. 1999). In the long-run, relative hotel and substitute prices affect visitor arrivals negatively, and income in the origin countries affect positively. Seetanah et al. (2010) modelled inbound tourism demand for South Africa with a gravity model. They estimated price and income sensitivity and the influence of other important determinants of tourist flows, such as socio-political factors and the location of markets. Using Pedroni’s panel cointegration estimation, they found that tourist flows are sensitive to tourism price changes in competing destinations and also to price changes in South Africa.

Çalışkan et al. (2019) analysed the link between incoming tourists and export and import volumes of Turkey and 13 Silk Road countries using panel ARDL. They found a link between international trade and tourist flows, which indicates the importance of international trade for the development of tourism.

More recently, Dogru et al. (2021) presented theoretical and methodological inaccuracies in tourism demand modelling and the use of panel data in this context. In a guide of the use of panel data, they, among others, recommend nonlinear empirical approaches or Bayesian methods.

To our knowledge, the tourism demand long-run relationship has not been studied allowing for cross-sectional heterogeneity in a panel data context, except for the paper of Gómez et al (2021) which analyses Airbnb’s influence on hotel occupancy in Mexico in a single approach with hotels grouped according to their category. However, these authors study the cross-sectional dependence, cross-sectional unit roots, and causality between these variables, but then analyse the cointegration relationship using Kao and Pedroni’s cointegration tests without employing recent advances such as the Common Correlated Effect (CCE) estimator considering the cross-sectional dependence. Given the rise of the sharing economy, tourism demand models should incorporate alternative online lodging platforms, such as Airbnb or HomeAway (Dogru et al. 2021), and analyse cointegration in the P2P accommodation market using panel data cointegration techniques. For this purpose, we therefore propose use of a cross-sectional dependence model allowing for slope heterogeneity in a fractional cointegration panel data framework, which extends the methodological framework of the CCE estimator.

4 Econometric methodology

To estimate the tourism demand model in a panel data context, we account for cross-sectional correlation and persistence of the variables. First, we assume this persistence being governed by a unit-root. To estimate such model, we use the CCE estimator of Pesaran (2006) (see Appendix 1). Second, we assume a more general persistence governed by long-range dependence. To estimate the resulting model, we use the fractionally integrated heterogeneous panel data model proposed by Ergemen and Velasco (2017). It extends the factor structure and the CCE estimation method of Pesaran (2006) to a fractional context. Unlike Pesaran (2006), both common factors and innovations can be fractionally integrated. The resulting persistence is thus more general than a unit root. Finally, the model allows for fixed effects. We, further estimate Rodríguez-Caballero (2022) approach which extends Ergemen and Velasco (2017)’s methodology by allowing for multi-level cross-sectional dependence.

Specifically, Ergemen and Velasco (2017)’s panel data model for N individual observations and T periods is defined as:

$$ \begin{aligned} & y_{it} = \alpha_{i} + \beta_{i0}^{\prime } x_{it} + \gamma_{i}^{\prime } f_{t} + \Delta_{t}^{{d_{i0} }} e_{1it} ,\,\;i = 1, \ldots ,N,t = 1, \ldots ,T, \\ & x_{it} = \mu_{i} + \Gamma_{i}^{\prime } f_{t} + \Delta_{t}^{{\vartheta_{i0} }} e_{2it} , \\ \end{aligned} $$
(2)

where \(y_{it}\) depends on potentially endogenous covariates \(x_{it}\) and m unobserved common factors \(f_{t} \,.\) These are fractionally integrated of order \(\omega_{i}\) (\(f_{t} \sim I\left( {\omega_{i} } \right)\)), i = 1,..,m. \(y_{it}\) and \(x_{it}\) depend on these factors according to their factor loadings \(\lambda_{i}\) and \(\Gamma_{i}\) respectively. In addition to idiosyncratic covariance stationary shocks \(e_{1it}\) and \(e_{2it}\), there are covariate-specific fixed effects \(\alpha_{i}\) and \(\mu_{i}\). These fixed effects would cause the mentioned endogeneity if not properly accounted for. \(\Delta_{t}^{\delta }\) denotes the truncated fractional filter defined as \(\Delta_{t}^{\delta } = \sum\nolimits_{j = 0}^{t - 1} {\pi_{j} \left( \delta \right)L^{j} }\), with \(\pi_{j} \left( \delta \right) = {{\Gamma \left( {j - \delta } \right)} \mathord{\left/ {\vphantom {{\Gamma \left( {j - \delta } \right)} {\left[ {\Gamma \left( {j + 1} \right)\Gamma \left( { - \delta } \right)} \right]}}} \right. \kern-0pt} {\left[ {\Gamma \left( {j + 1} \right)\Gamma \left( { - \delta } \right)} \right]}}\), \(\Gamma \left( \tau \right) = \infty ,\;\tau = 0, - 1, - 2, \ldots\), but \({{\Gamma \left( 0 \right)} \mathord{\left/ {\vphantom {{\Gamma \left( 0 \right)} {\Gamma \left( 0 \right)}}} \right. \kern-0pt} {\Gamma \left( 0 \right)}} = 1\), and L denoting the lag operator.

The memory parameters of interest are the residual integration order, \(d_{i0}\), and the memory of the defactored (unobserved) explanatory variable, \(\vartheta_{i0}\). It is worth noting that the memory of \(y_{it}\) is \(\max \left\{ {\vartheta_{i0} ,d_{i0} ,\max_{i} \omega_{i} } \right\}\) and that of \(x_{it}\) is \(\max \left\{ {\vartheta_{i0} ,\max_{i} \omega_{i} } \right\}\). Besides \(\vartheta_{i0}\) and \(d_{i0}\), we are interested in the potential fractional cointegrating vector \(\beta^{\prime}_{i0}\). The cointegrating error \(y_{it} - \alpha_{i} - \beta^{\prime}_{i0} x_{it} - \lambda_{i}{\prime} f_{t}\) is asymptotically stationary for \(d_{i0} < 0.5\) and asymptotically nonstationary for \(d_{i0} \ge 0.5\). (Fractional) cointegration requires \(d_{i0} < \vartheta_{i0}\) which can be tested with the t-test, \(t = (\hat{\vartheta }_{i0} - \hat{d}_{i0} )/s.e.(\hat{\vartheta }_{i0} - \hat{d}_{i0} )\), where \(s.e.(\hat{\vartheta }_{i0} - \hat{d}_{i0} )\) is the standard error of the difference of memory estimates.

To estimate (2), fixed effects are removed by first differences:

$$ \begin{aligned} & \Delta y_{it} = \beta_{i0}^{\prime } \Delta x_{it} + \gamma_{i0}^{\prime } \Delta f_{t} + \Delta_{t}^{{1 - d_{i0} }} e_{1it} ,\,\;i = 1, \ldots ,N,t = 1, \ldots ,T, \\ & \Delta x_{it} = \Gamma_{i}^{\prime } \Delta f_{t} + \Delta_{t}^{{1 - \vartheta_{i0} }} e_{2it} . \\ \end{aligned} $$

After projecting the individual time series on their (fractionally) differenced cross-section averages, the heterogeneous slope and memory parameters are estimated by generalized least squares. Finally, an equation-by-equation conditional-sum-of-squares approach renders estimates of the relevant memory parameters. Note that, contrary to the unit root case, estimates and test statistics have standard properties.

The common-correlation mean-group (CCMG) estimate is obtained as average of the individual slope coefficients \(\hat{\beta }_{i0} \left( {\hat{d}_{i} ,\hat{\vartheta }_{i} } \right)\) depending on the individual memory parameters \(\hat{d}\) and \(\hat{\vartheta }\):

$$ \hat{\beta }_{CCMG} \left( {\hat{d},\hat{\vartheta }} \right) = \left[ {\frac{1}{N}\sum\nolimits_{i = 1}^{N} {\hat{\beta }_{i0} \left( {\hat{d}_{i} ,\hat{\vartheta }_{i} } \right)} } \right]. $$

The hypothesis \(\beta_{CCMG} \left( {d,\vartheta } \right) = \beta_{0}\) can then be tested by the t-test:

$$ t_{CCMG} = \sqrt N \frac{{\left( {\hat{\beta }_{CCMG} \left( {\hat{d},\hat{\vartheta }} \right) - \beta_{0} } \right)}}{{\hat{\Omega }_{w} \left( {\hat{d},\hat{\vartheta }} \right)^{1/2} }}, $$

where \(\hat{\Omega }_{w} \left( {\hat{d},\hat{\vartheta }} \right)\) is an estimate of the asymptotic variance–covariance matrix:

$$ \hat{\Omega }_{w} \left( {\hat{d},\hat{\vartheta }} \right) = \frac{1}{N - 1}\sum\nolimits_{i = 1}^{N} {\left( {\hat{\beta }_{i0} \left( {\hat{d}_{i} ,\hat{\vartheta }_{i} } \right) - \hat{\beta }_{CCMG} \left( {\hat{d},\hat{\vartheta }} \right)} \right) \times \left( {\hat{\beta }_{i0} \left( {\hat{d}_{i} ,\hat{\vartheta }_{i} } \right) - \hat{\beta }_{CCMG} \left( {\hat{d},\hat{\vartheta }} \right)} \right)}^{\prime } . $$

Rodríguez-Caballero (2022) extends Ergemen and Velasco (2017)’s methodology by dividing the panel into blocks and allowing for different factors in each of these blocks (in addition to general factors for the whole panel). This leads to a multi-level cross-sectional dependence. In particular, (2) changes to:

$$ \begin{aligned} & y_{r,it} = \alpha_{r,i} + \beta_{r,i0}^{\prime } x_{r,it} + \kappa_{i}^{\prime } g_{t} + \gamma_{r,i}^{\prime } f_{r,t} + \Delta_{t}^{{d_{r,i0} }} e_{1r,it} ,\,\;i = 1, \ldots ,N_{r} ,t = 1, \ldots ,T,r = 1, \ldots ,R, \\ & x_{r,it} = \mu_{r,i} + K_{i}^{\prime } g_{t} + \Gamma_{r,i}^{\prime } f_{r,t} + \Delta_{t}^{{\vartheta_{r,i0} }} e_{2r,it} , \\ \end{aligned} $$

where R denotes the number of blocks, Nr the number of units in block r, and where all variables, factors, and parameters from (2) are now block r specific. In addition to the block-specific factors \(f_{r,t}\), there are pervasive factors \(g_{t}\) that affect all blocks. \(\kappa_{i}^{{}}\) and \(K_{i}^{{}}\) are their respective factor loadings.

The author highlights that neglecting the presence of this non-pervasive cross-sectional dependence may cause possible biases. To estimate the model, it suffices to apply Ergemen and Velasco (2017)’s methodology to each of the blocks separately.

Before estimating these panel models with an underlying factor structure, we motivate this methodology by applying Bailey et al. (2016)’s Exponent of Cross-Sectional Dependence test. This allows estimating the degree of dependence \(\alpha \) between the different lodgings. Only for strong or semi-strong dependence, i.e. for \(\alpha >0.5\), a factor structure is sensible (see Bailey et al 2016, for further details and Chakraborty and Mazzanti (2020) and Rodríguez-Caballero and Vera-Valdés, 2021, for applications).

5 Empirical analysis

5.1 Data and variables

The Canary Islands is a Spanish archipelago located 2000 km from Peninsular Spain in the Atlantic Ocean. It has been a popular holiday destination for Europeans – especially UK, Germany, and mainland Spain—since the 1960s, with more than 15 million visitors in the pre-pandemic year 2019. The tourism is varied: sandy beaches, nature (including three National Parks) and urban areas, especially, in the two main islands Gran Canaria and Tenerife. Its sharing accommodation market has grown rapidly over the last decade, with above 100,000 listings in the Airbnb and HomeAway online platforms. These include apartments (most abundant type with over 50% of the total), houses, villas (the highest priced type of accommodation), condominiums (jointly with apartments the lowest in average price), bungalows, cottages, and townhouses. These types of accommodation represent more than 90% of the total and are managed by hosts both with one and with multiple properties with the former obtaining lower revenue per property. An increasing number of properties per host has continuously increased the revenue. Since the 2010s, the sharing accommodation sector’s market share has continuously increased, with around 8.6% in 2016 and 11.9% in 2020 (ISTAC 2021).

This paper uses monthly observations of Airbnb properties in the Canary Islands (Spain) from January 2017 to September 2021. We obtain a balanced panel by only considering properties with data available for the whole period. For reasons of consistency, only establishments classified as “Entire home/apt” have been included in the sample. In this group of listings, the three most frequent types were considered: Villa (49.44%), Apartment (41.01%), and House (9.55%). The resulting database after applying these selection criteria contained 1068 listings with information over 57 months, equivalent to a total of 60,876 observations. Figure 1 illustrates the spatial distribution of the Airbnb listings in the study period in the Canary Islands and the spatial distribution of the hotels and apartments operating in 2020 in the region. The distributions of hotels and Airbnb houses are very similar, being concentrated principally in coastal areas mainly associated with beach tourism. However, the arrival of Airbnb has increased the supply of accommodation in the inland areas of the islands of Gran Canaria and Lanzarote. It should be added that there are no properties that have been operational during the entire period on the two smaller islands (La Gomera and El Hierro).

Fig. 1
figure 1

(Source: AirDna and ISTAC)

Spatial distribution of the sample of Airbnb listings and hotels/apartments in the Canary Islands during the study period

In our empirical study, we work with the following variables from the general model structure defined by Eq. (1) in Sect. 2. The most popular dependent variable in demand studies since 2000 has been visitor arrivals (Song and Li 2008), although there are other variables such as the number of tourist nights spent by tourists or the tourism expenditure by visitors in a destination which have also been used in several studies (Song and Witt 2000). In this paper, we use the occupancy rate (OCR), instead, which depends on the number of days spent, as a proxy for demand following recent papers on tourism demand such as Gunter et al. (2020), Suárez-Vega et al. (2023), and Jiménez et al. (2023). Proxies for demand determinants tend to differ substantially between studies (Song and Witt 2000; Li et al. 2005). They generally incorporate variables such as income, relative prices, substitute prices, but also factors such as travel costs, exchange rates, deterministic trends, and dummy variables for some special events.Footnote 1 We consider the demand for competitors in the vicinity – Airbnb listings and hotels – instead. Their prices represent the costs of substitute products. Finally, the real GDP for the main tourist country visitors serves as proxy for tourist income.

The empirical demand model is based on the following variables:

  • OCR: average monthly occupancy of Airbnb housing. This information is provided monthly by Airbnb through AirDNA (https://www.airdna.co/) and it is measured as the Total Booked Days/(Total Booked Days + Total Available Days). This variable varies both temporally (monthly) and spatially at the level of the listing location. Calculation only includes vacation rentals with at least one booked night.

  • RADR: the relative average daily rate of the listing for a given month. The ADR (average daily rate) is also provided by AirDNA and represents the average price per day of rent. The RADR for a given listing in each month is then calculated as the proportion representing the own ADR with respect to the average of the ADR of the Airbnb listings in a 5 km surrounding area. Values greater than 1 imply that the own ADR exceeds the average ADR in the surrounding area. RADR has a similar temporal and spatial distribution to that of the OCR.

  • HADR: the average daily rate of hotels in the municipality where the house is located. This information is supplied by the Canary Institute of Statistics (known by its initials in Spanish as ISTAC). The variable varies monthly (in time) and the spatial variation is at the municipal level.

  • GDP: the average real GDP of visitors to the island where the house is located. It is calculated as the weighted average (in function of number of tourists) of the real GDPs of the seven countries that provide the most visitors to the islands (Germany, Belgium, Denmark, Great Britain, Holland, Sweden and Spain). Tourists from these countries make up 78.56% of the total number of tourists visiting the Canary Islands in the study period (January 2017–September 2021). Data concerning the number of visitors to each island by origin and month were provided by ISTAC (2021). Quarterly real GDP series (in million euros) for these countries were obtained from EUROSTAT (2021). The quarterly GDP was equally distributed among the three months that make up the quarter. Finally, the values were updated by taking 2017 as the base year and using the corresponding harmonized index of consumer prices (HICP). GDP varies spatially by island.Footnote 2

The variables are described in Table 1 (Panel A). The average occupancy (OCR) for the properties in the sample is 38%, with some houses having no occupancy for a month while others are occupied the whole month. The relative average daily rate (RADR) is almost 1, which would suggest a balance of the ADR of the listings with respect to those in the surrounding area. However, there are cases where the differences are significant, reaching almost 36 times the average price supplied in the neighbourhood. Finally, the ADR for hotels in the region is €80.70, with fluctuations up to almost €153.

Table 1 Descriptive statistics for the whole and restricted sample (not seasonally adjusted)

Especially for OCR and RADR many values are missing. Therefore, we drop all listings in which more than 10 observations are missing in at least one of the variables. For the remaining listings, we replace missing values with 0. This reduces the number of listings to N = 199. Table 1 (Panel B) shows the descriptive statistics of this new sample. While HADR and GDP are similar, OCR and RADR are larger in the restricted sample, reflecting the fact that missing data is concentrated in these two variables.

To avoid seasonal “noise” in the standard and fractional cointegration analysis, all data are seasonally adjusted using the X-11 decomposition method initially developed by the US Census Bureau in 1965.Footnote 3 The X-11 method uses weighted averages over a moving window of the time series. Data for each listing in the panel was converted into a time series. The series which featured seasonality, as detected by the QS test (Maravall 2012), were replaced with their seasonally adjusted version.

As mentioned before, to further motivate the factor structure with strong dependence between the underlying variables, we estimate the cross-sectional exponent. Table 2 confirms that the dependence is either semi-strong or strong. In particular, for all four variables (both in levels and the more relevant first-differences), the 95% confidence intervals lie above 0.5. Thus, we conclude that a factor structure can account for the dependence.

Table 2 Estimation of cross-sectional exponent and test of Bailey et al (2016)

5.2 Estimation results

In this section, we present the standard and the fractional cointegration analysis.

To obtain a linear relationship between the variables they are transformed to natural logarithms. The regression coefficients are then the price elasticity of demand of the listing relative to the neighbour listings (RADR), price elasticity for substitute accommodation in form of hotels and apartments (HADR), and income elasticity (GDP).

All the tests and estimations presented in this paper were done using STATA v.17 for the CCE estimator and MATLAB R2018b for the fractional integration and cointegration models. It should be noted that the CCE results are obtained with three cross-sectional lags. Also, the number of variables in the mean group regression is 597 and the number of variables partialled out is 3383 in the CCE method.

Next, we briefly describe the individual slope and mean group estimates.

5.2.1 Individual slope estimates and fractional cointegration analysis

Because the number of listings is high (N = 199), for the sake of brevity we do not show the estimated coefficients for all properties. Instead, we plot the individual CCE and fractional integrated heterogeneous panel data estimates in Figs. 2 and 3, respectively. These figures include point estimates and their 95% confidence intervals for each variable: the relative ADR of properties in a radius of 5 km, the ADR of hotels in the municipal demarcation of the listings and GDP.

Fig. 2
figure 2

Individual slope estimates and confidence intervals from CCE. Note. Individual slope estimates (together with the corresponding confidence intervals at the 95% confidence level)

Fig. 3
figure 3

Individual slope estimates from the fractionally integrated heterogeneous panel data model. Note. Individual slope estimates (together with the corresponding confidence intervals at the 95% confidence level)

In both cases, the estimates of the three slope coefficients are quite variable, with positive and negative estimates for different listings. For example, 40 listings feature negative RADR coefficients, while 17 feature positive HADR coefficients and 8 positive GDP coefficients, all statistically significant (at the 5% significant level). On the other hand, we observe that the sample variability of the CCE estimates is higher than that of the fractional integrated heterogeneous panel data estimates. In the latter case, 98 listings have a negative RADR coefficient, and 25 and 17 listings have positive coefficients for HADR and GDP, respectively, all statistically significant at the 5% significance level.

Another interesting aspect concerns the memory estimates in the fractional integrated heterogeneous panel data model. Figure 4 shows the point estimates together with the 90% confidence intervals of the memory of defactored log OCR, \(\hat{\vartheta }_{i0} ,\) (panel A) and the residual (cointegration error) integration order, \(\hat{d}_{i0}\), (panel B) by lodging. Figure 5 shows the corresponding distributions of the memory estimates of the log OCR and of the residual integration order. Clearly, for most listings both are between 0 and 1, giving support for the fractional analysis. Thus, the OCR show a nonstationary mean-reverting long-memory behaviour with long-lasting effects of shocks. Similarly, the cointegration errors (i.e., the deviations from the long-run equilibrium) are also quite persistent.

Fig. 4
figure 4

Memory of defactored log OCR, residual integration order and cointegration t-statistics. Note. Point estimates together with the 90% confidence intervals of the memory of defactored log OCR and residual integration order. t-statistics for testing cointegration together with the 90% critical values

Fig. 5
figure 5

Distribution of the memory of defactored log OCR and the residual

Next, we formally analyse the fractional cointegration properties. As previously mentioned, fractional cointegration requires that \(\hat{d}_{i0} < \hat{\vartheta }_{i0} .\) which can be tested using the t-test, \(t = (\hat{\vartheta }_{i0} - \hat{d}_{i0} )/s.e.(\hat{\vartheta }_{i0} - \hat{d}_{i0} )\). Figure 4 panel C shows the t-statistics for testing cointegration together with the 90% critical values by lodging. Figure 6 shows the distribution over these t-statistics. It turns out that in 45 out of the 199 listings (around 23%) fractional cointegration is confirmed (the mass on the right of the vertical line at 1.64 in Fig. 6, which represents the 5% significance level for a one-sided test and the number of dots above the critical value in Fig. 4, panel C). We take this as evidence for a long-run relationship between the occupancy rate and the prices and income. In fact, the listings could be sufficiently similar, and we would expect a similar long-run relationship for most of them. Given that the time dimension is quite short, we further would expect that the cointegration detection mechanism does not have too high power. Finally, since cointegration is found in considerably more than 5% of the properties, we argue that there is indeed a long-run relationship at the listings level.

Fig. 6
figure 6

Distribution of the t-statistics for the test of fractional cointegration. Note: Frequency distribution over the t-test statistics for fractional cointegration. The vertical line at 1.64 indicates the region in which we find cointegration at the 5% significance level

We also have applied the standard panel cointegration tests by Pedroni (1999, 2004) and by Westerlund (2005). These test the null of no-cointegration against either homogenous (with all units behaving the same) or heterogeneous alternatives. They further deal with serial correlation and cross-sectional dependence (Westerlund 2005). The tests were performed in Stata using the command xtcointtest, with the autoregressive order (to deal with serial correlation) chosen by AIC in the Pedroni panel cointegration test, and a time trend and time demeaning in both tests. Finally, the alternative is cointegration in at least some panels in the Westerlund cointegration test. All tests clearly find cointegration (with p values of 0.00). In consequence, evidence of cointegration is stronger with the standard methods. However, while these methods have found cointegration for all lodgings, with the fractional method we have found cointegration for a subset of the lodgings.

5.2.2 Mean group estimates

The cross-sectional and slope heterogeneity estimates in the previous section show evidence of cointegration for some cross-sectional units. In this sense, the long-term parameters of the tourism demand models can be estimated because the variables have a cointegration relationship (Dogru et al. 2021).

Table 3 shows the mean group estimates from the CCE and the fractionally integrated heterogeneous panel data model. Both show a significant negative effect of the relative ADR of the facility and a positive significant effect of the ADR for the hotels. For example, an increase of 1% in the relative listing prices (RADR) implies a decrease of 0.25% in OCR, while an increase of 1% in the HADR implies an increase of 0.07% of OCR, both in the fractionally integrated case. The estimated elasticities are notably larger with the CCE. Therefore, demand is own-price-inelastic and traditional accommodation and P2P in the Canary Islands are substitute goods. Furthermore, in both, the GDP of the origin country is insignificant, indicating that the income of the origin country of tourists does not affect the demand at the destination. Therefore, economic policies need not account for this variable.

Table 3 MG estimation results

These results are in line with the empirical literature such as Gunter et al. (2020) for the city of New York, and Jiménez et al. (2023) for Spanish cities. For example, Gunter et al. (2020) showed that Airbnb demand to New York City is price-inelastic, which is consistent with earlier findings for Vienna (Gunter and Önder 2018). Also, their results suggest that Airbnb listings in New York City are substitutes for the traditional accommodation industry. Other authors such as Guttentag and Smith (2017) also showed that Airbnb is a substitute for hotels and other accommodations. However, Gunter and Önder (2018) argue that the city’s Airbnb offer is a complement, rather than as a substitute, for the traditional accommodation industry in Vienna. Nevertheless, with regard to income, our results indicate that there is no influence. But, Gunter et al. (2020) showed that Airbnb demand in New York City is income-elastic, i.e., Airbnb accommodation is a luxury good. Finally, Jiménez et al. (2023) studying several Spanish cities where Airbnb works, have found that there is not only a substitute relationship between Airbnb and hotels, but also that the greater number of beds offered by Airbnb has led to an increase in the total number of visitors received by this type of firms (due to higher Airbnb occupancy rates).

Finally, our paper complements the recent work of Suárez-Vega et al. (2023), who analysed the dynamic and spatial properties of the substitution effect between P2P and traditional accommodation (hotels and apartments) using a dynamic spatial demand model with monthly data. For the period before COVID-19, they find positive autocorrelation in the occupancy rates, own-price-elastic demand, and significant substitution with its competitors (hotels and apartments) in the short-run. Both price sensitivity and substitution effect increase in the long-run. Whereas they analysed the dynamic properties in a spatial context, we focus on the long-run equilibrium demand in a time-series panel context. Note that the COVID-19 outbreak considerably distorted price and income elasticities in their study, an aspect which we have not studied in this paper but leave for future research.

5.3 Robustness checks

5.3.1 Results for non-seasonal adjusted time series

In this section, as a robustness check, we repeat the preceding analysis with data which is not seasonally adjusted. As mentioned in footnote 3, we might expect that the factors collect the seasonal noise (see Camacho et al 2015). Table 4 shows that the estimated MG effects are comparable to the ones obtained with the seasonal adjusted data. In particular, with the CCE approach results are negative for all coefficients and statistically significant at 5% significance level for Log RADR and Log GDP, while with the fractional approach we obtain significant negative results for Log RADR and positive for Log HADR.

Table 4 MG estimation results (data without seasonal adjustment)

Therefore, seasonal and non-seasonal adjusted data lead to similar results for the fractional panel model but not for the classic CCE. In any case, recall that the mean group estimates of the CCE are affected by the high variability of the individual slope estimates.

5.3.2 Analysis by provinces

In this section we repeat the fractional analysis for seasonal adjusted series but now applying Rodríguez-Caballero (2022)’s approach which allows for multi-level cross-sectional dependence. For doing so, we split the sample into two blocks of lodgings: the ones belonging to the province Santa Cruz de Tenerife which comprises of the islands Tenerife and La Palma (with 49 observations) and the province Las Palmas which comprises of the islands Fuerteventura, Gran Canaria, and Lanzarote (with 150 observations).Footnote 4 Table 5 gives the mean group estimates by province and for both provinces combined.

Table 5 MG estimation results by province

These results confirm that also separated by the provinces Log RADR keeps being negative and significant, however, Log HADR stops being so for the province of Las Palmas. Most importantly, for both provinces together, but allowing for different factors in each province, the mean group estimators resemble the ones found in Table 2 for the Ergemen and Velasco (2017) approach.

6 Conclusions

We analyse the long-run equilibrium relationship (cointegration) at the listing level between P2P occupancy rates (as proxy of their demand) and several exogenous variables, in particular, own prices, price of competitors (hotel and apartments) and visitor income. The econometric methodology is based on the fractional heterogeneous panel data cointegration model of Ergemen and Velasco (2017), which allows for heterogeneity of listings, dependency between them and fractional integration to accommodate possible slow reversion in the adjustment process to the equilibrium. For comparison, we also apply the classic common correlated effects (CCE) estimator.

We use information on the P2P accommodation market in the Canary Islands on the basis of 5 years of monthly data and analyse the degree of substitution relative to hotels and apartments as most important types of lodging in this archipelago.

In general, there is some evidence of cointegration at the listing level (in particular, there is (fractional) cointegration for 23% of listings), indicating that the market could have a long-run equilibrium relationship between occupancy rates and own prices, cross prices, and income for some listings. Also, mean group estimates for all slopes show that demand is own-price-inelastic, while cross-price elasticity indicates that traditional accommodation and P2P in the Canary Islands are substitute goods. Income elasticity is lower than 1 and is not statistically significant, indicating that the demand for tourism in the Canary Islands is insensitive to the economic situation in the origin countries.

6.1 Theoretical implications

From the theoretical point of view, both fractional and standard cointegration analyse long-run equilibria. Standard cointegration assumes series with unit roots that are connected in a stationary long-run relationship. However, both assumptions are rather restrictive and relaxed within the fractional cointegration framework. This framework allows both original series and cointegrating errors to be fractionally integrated instead. Hence, the fractional cointegration analysis allows a more general long-run relationship in which short-term system shocks are slowly accommodated. The analysis conducted in our study confirms that a fractional analysis might indeed be more appropriate. Therefore, managers could reach erroneous conclusions if they did not account for fractionally integrated framework in their empirical study.

Both the CCE approach and the fractional panel model rely on a rank condition which requires the number of factors be not too large. More specifically, this number must not exceed the rank of the matrix of averaged factor loadings which in practice is unobserved (see De Vos et al. 2021, for a classifier aiming at testing this condition). In our context, we assume that the variables in our model suffice to guarantee that this condition holds. In fact, this could be an additional motivation for Rodríguez-Caballero (2022)’s approach which via the assumed multi-level cross-sectional dependence might reduce the overall number of factors for each province.

6.2 Practical implications

From a managerial point of view, our results are consistent with the standard demand theory which points out that own and competitor prices play a key role in determining long-term demand. Therefore, estimating such models can prove important for demand strategies.

In particular, our analysis allows managers to use such models to conduct insightful studies on price sensitivity and income and to design long-term policy recommendations in the P2P industry. On the one hand, managers or policy makers can use the estimated price elasticities to analyze their impact on total revenue (Song and Witt 2000) or predict the effectiveness of policies implemented in reducing or increasing demand (Dwyer et al. 2010). For example, following Song and Witt (2000), managers seeking to calculate total revenue for listings could focus on the price elasticity and price of tourism to calculate the marginal revenue (i.e., extra revenue generated by a one-unit increase in sales). If P2P demand were price elastic (in absolute value it will exceed unity), marginal revenue will be positive. However, for our data, the price elasticity of demand is negative and below one (in absolute value) or inelastic, implying that an increase in the own price listings will result in a less than proportional change in listing demand. Therefore, as a result, total tourist revenue would decrease.Footnote 5

On the other hand, the estimated cross-price elasticity can help managers determine whether their listings are substitutes, complements or are independent as regards other products (e.g., other neighbouring Airbnb listings, hotels). This information is essential for formulating pricing strategies and for analysing risk associated with various products (e.g., multi-unit hosts managing several types of listings where important substitute relations could exist among them). For example, in our case, Airbnb listings are substitutes for the traditional hotel accommodation industry. However, the low cross-price elasticity implies a low sensitivity to price changes in the hotel accommodation industry in the long-run and a rather low level of competition between Airbnb listings and hotels. Therefore, managers need not to respond to a competitor’s price reduction as fast as they would have to if the cross-price elasticity was high. Finally, the estimated income elasticity carries little information. In fact, our results cannot help managers to implement marketing strategies for Airbnb listings such as identifying potential markets for their products when changes in income are expected.Footnote 6

Another interesting implication of the cointegrated model is that it allows managers to perform sensitivity analyses on demand in the long-run. For example, managers or policy makers can use the proposed model to analyze the effects of own and substitution price elasticities on demand using subsamples of listings, e.g., Airbnb listings located in different provinces. We, also, could analyze demand differences between the type of hosts (e.g., single-unit- and multi-unit hosts), which can represent different levels of professionalism in the P2P industry.

Certain limitations should be acknowledged. One is technical in nature, namely the dependency of our proposed method on balanced panel data which reduces considerably the sample data. Another methodological limitation is that we do not incorporate the lagged dependent variable and/or weakly exogenous variables as regressors because there is no econometric model for this specification. In particular, to our knowledge, whereas for the standard model there is the model of Chudik and Pesaran (2015), there is no equivalent model for the fractional case. The inclusion of the lagged dependent variable would allow for a dynamic panel structure. This could account for certain dynamic features of the lodgings. In particular, the occupancy rate could not only depend on the relative average daily rate of the listing, the average daily rate of hotels in the municipality and GDP, but also in a general manner on its own past. In any case, in the fractional model, the memory parameter allows to capture at least some of this dynamic behaviour.

Future research could try to incorporate the lagged dependent variable as regressor in the fractional panel model and include the impacts of other economic factors on demand such as exchange rates and marketing policies. With more available data, it would also be interesting to investigate the effect of COVID-19 on the analysed long-run demand relationship. Alternatively, to the employed CCE approach, one could model the cross-sectional dependence using Bai (2009)’s approach of interactive fixed effects. This model uses principal components analysis to estimate both the structural parameters and the common component. The latter might contain information on common features in the data.