1 Introduction

Firm failure prediction, also called financial distress prediction or default prediction, provides the early warning signals when examining a firm’s financial health. Research on this topic has been carried out since Beaver (1966) and Altman (1968) pioneered this work. Many researchers have attempted to further examine what determines the failure of firms because it plays a crucial role in decisions taken by company managers, investments made by investors, credit decisions reached by creditors, and customer credit ratings prepared by banks (Sun et al. 2014).

Over the last half century, extensive research has investigated failure prediction on large and listed firms (e.g., Altman 1968; Fernando et al. 2019; Tinoco and Wilson 2013; Zhang et al. 2010). However, unlike mature and larger firms that are more experienced and capable of detecting environmental signals (Klimas et al. 2021), SMEs are under growing pressure from global competition and complexity, and they encounter a greater challenge to improve and sustain their business performance over time (Kraus et al. 2012). In the aftermath of the global financial crisis of 2007 to 2009, banks have implemented stricter creditworthiness standards. Consequently, small firms are confronted with greater difficulty in accessing credit and obtaining financial resources, and greater requirements in borrowing money from financial entities compared to larger firms. These constraints have imposed additional challenges on businesses and have been conducive to more loan defaults and business failure (Ciampi et al. 2021; El Kalak and Hudson 2016; Garcia-Martinez et al. 2023).

In accordance with a widely accepted definition provided by the European Union, a firm employing fewer than 250 persons with an annual turnover of less than €50 million is defined as an SME (El Kalak and Hudson 2016; Gupta et al. 2015; Gallucci et al. 2023). New ventures, also called entrepreneurial ventures, are newly established firms in the early stages of their organizational lives and are unstable in performance due to their unfinalized business models (Zhou et al. 2023). The criterion used to distinguish new ventures from more established firms is always years in business. However, there is no consensus on the year cutoffs—the time constraint ranges from six to ten years (Li 2020). In this study, following Fonseca et al. (2022), we examine new ventures starting activity between 2010 and 2018 in Portugal. Due to their small size and lack of experience, new ventures are more vulnerable to changes than already established firms in the economy and the market (Gregg and Parthasarathy 2017), and they face more obstacles to market entry, insufficient skilled staff, low network connections, and poor financing conditions (Kücher et al. 2020). Although a venture may become an established firm at a later stage of its organizational life cycle, the way it evolves may present unique resources and strategic challenges that an SME may not encounter (Patel et al. 2021). During economic downturns, such as the financial crisis, compared with more established SMEs, new ventures could be more vulnerable to the impact of credit crunches resulting from financial instability due to their limited collateral and relatively young age (Gaies et al. 2023; Guedes et al. 2021).

SMEs are the backbone of economies and constitute a significant portion of the business sector (Argente-Linares et al. 2013), accounting for more than 95% of the Organization for Economic Cooperation and Development (OECD) members' firms and 99% of all firms in the European Union. Indeed, they contribute to more than half of all value added (Filipe et al. 2016; Garcia-Martinez et al. 2023). According to Pordata (2023), SMEs account for 99.9% of total enterprises in Portugal. Due to their simpler structure compared to large firms, SMEs can respond faster to changing economic conditions and meet the needs of local consumers (Altman and Sabato 2007). Moreover, the development of SMEs at a faster pace is beneficial for employment opportunities in the labor market and contributes to national growth and development by providing innovation potential, increasing employment levels, and creating added value (Rotar et al. 2019). In comparison to SMEs, new ventures face greater challenges in securing resources during the early stages of development as a consequence of their liability of newness and smallness (Freeman et al. 1983; Patel et al. 2021; Stinchcombe 1965). However, the growth of new ventures significantly contributes to job creation and economic development (Li 2020). In terms of technological and organizational innovation, new ventures can offer an important boost to productivity in their particular industry (Resende et al. 2016) and enhance their sustainability (Berman et al. 2022). Moreover, they are regarded as the engine of the economy. Consequently, it is important to understand the determinants of failure for both SMEs and new ventures and to determine the differences between them, given their different natures. In our study, we will distinguish the different determinants of failure for unlisted SMEs and new ventures. For this purpose, we will use the financial ratios, size, and age of these firms because a higher level of vulnerability exists for unlisted firms that stems from changes in financial ratios (Gupta and Gregoriou 2018).

In this current study, we draw on a near census of unlisted SMEs and new ventures in Portugal. The Portuguese economy is dominated by SMEs, accounting for 68.3% of the value added and 77.4% of employment. Over the period 2014 to 2018, there was an increase in employment and value added in SMEs of 15.2% and 27.0%, respectively (European Commission 2019; Patel and Guedes 2021). We apply a stepwise regression technique and the logistic model to develop one-year default prediction models for SMEs and new ventures separately. Portuguese law requires firms to disclose their financial statements regardless of age or size, in contrast to most countries where unlisted companies are not mandated to do so (Guedes et al. 2022). Thus, our analysis uses a unique and large sample of 229,855 SMEs and 101,645 new ventures covering 2010 to 2018. To validate the developed prediction models, the receiver operating characteristic (ROC) curves are reported for both in-sample and out-of-sample models. The findings show a significant variance in failure predictors between SMEs and new ventures. Consequently, it is highly recommended that they should be treated separately when assessing their risk of failure.

This study makes several contributions. First, it differs from the existing literature in that it focuses on default prediction in the context of SMEs and new ventures in a geographically different sample. It is important to note that the outcomes of firms’ default risk predictions vary significantly from country to country (Altman and Sabato 2005; Kovacova et al. 2019). Moreover, the study contributes to the firm failure literature in that it is among the first to look at failure prediction for both unlisted SMEs and new ventures playing major roles in boosting economic development (Resende et al. 2016; Rotar et al. 2019). Second, due to the different nature of SMEs and new ventures, we investigate whether the same set of predictors applies to both types of business. We found that the determinants affecting failure vary between SMEs and new ventures, which illustrates the need to rely on different financial indicators for appropriate default assessment. We provide evidence for investors, creditors, and banks indicating that separate treatment should be implemented when assessing the default risks of SMEs and new ventures. Last, when comparing the classification accuracy of the prediction models for both firm types, the new ventures model performs less well than that of SMEs, which echoes the need to employ more non-financial metrics in order to evaluate the survival and failure risk of new ventures.

2 Literature review

2.1 Firm failure prediction

Firm failure is costly not only to owners or promoters of firms but also to the whole economy as a result of jobs lost and bad loans by banks and credit unions (Popoola 2022). Inspired by the research of Beaver (1966) and Altman (1968), studies on predicting corporate default have been undertaken. This kind of early warning prediction of failure benefits the financial stability of the overall banking system, provides evidence for decision making by both managers and creditors, and allows regulators and other agents in the market to identify the potential problems of firms and evaluate the impact on the economic environment (Antunes et al. 2016; Sun et al. 2014). Moreover, differences in country-specific circumstances, such as economic conditions, legal regulations, cultural context, the dynamics of financial markets, and established accounting practices, impact the international applicability of the prediction models (Altman et al. 2017). In the context of Portugal, non-financial firms' total debt reached 115% of GDP in 2015, one of the highest values in the Eurozone, which put great pressure on banks and creditors (Antunes et al. 2016). SMEs account for more than three quarters of employment, and the Portuguese government remains committed to improving entrepreneurship as part of the national strategy to tackle youth unemployment (European Commission 2019). Consequently, it is pivotal to carry out a study in the context of this specific nation. Furthermore, compared with large firms, SMEs have unique financial characteristics and are perceived as having higher credit and operational risks (Andrikopoulos and Khorasgani 2018). It is, therefore, necessary to design special default prediction models for SMEs (Ciampi 2015). Meanwhile, the determinants of new ventures’ performance and survival have received conspicuous attention in the literature due to their unique features (Fonseca et al. 2022; Holmes et al. 2010; Patel et al. 2020). However, SMEs and new ventures have different natures, which lead to different causes of failure. New ventures are more prone to failure due to internal weaknesses stemming from deficiencies in management and economic competencies, and limited knowledge of financial accounting. However, more established SMEs are more likely to fail because of bureaucratic procedures and organizational inertia resulting in reduced profitability (Kücher et al. 2020). Although some studies have revealed that different-sized SMEs have different determinants of failure prediction (see El Kalak and Hudson 2016; Gupta et al. 2018), few studies distinguish between SMEs and new ventures, especially in differentiating failure prediction. In consequence, our study is designed to fill this gap.

2.2 Financial ratios as predictors

Financial ratios have proved to be important predictors of firms’ financial health and future default risk. In comparison to other types of variables—including market variables—they display superior predictive capacity and are the dominant signals of failure (Cultrera and Brédart 2016; Séverin and Veganzones 2021; Tian et al. 2015). The failure of SMEs can be attributed to poor financial planning (Popoola 2022). Thus, using financial analysis to determine the financial stability of a firm and to detect strengths and weaknesses can be a very useful diagnostic tool to predict the financial health of a business (Gregova et al. 2020). However, financial analysis may bring certain challenges—for example, obtaining financial information from SMEs and new ventures is difficult because they typically lack certified, audited financial statements. Since they do not generate credible financial information on a regular basis, they are much more opaque than large corporations (Berger and Frame 2007; Ciampi and Gordini 2013). Our study is noteworthy in that it covers SMEs and new ventures using reliable financial information, certified by chartered accountants as required by the Portuguese government.

Since the pioneering work of Altman in 1968, financial ratios measuring profitability, liquidity, and solvency were found to be crucial predictors of firms’ bankruptcy. To date, several studies have examined the determinants of failure and default prediction of SMEs and new ventures in several geographies. The majority of the default studies are based on the selection of quantitative predictive variables derived from financial ratios dealing mainly with profitability, liquidity, leverage, activity, and efficiency (for example, Altman and Sabato 2007; Fuertes-Callén et al. 2022). Based on the theory of organizational ecology claiming that, as a result of natural selection, the market eliminates weak firms (Hannan and Freeman 1977), efficient firms maximizing profits will survive and predominate. In unfavorable environmental conditions, firms can increase their chances of survival if they have enough liquidity, whereas failure occurs when highly leveraged firms are unable to meet their debt service obligations (Fuertes-Callén et al. 2022; Miller 1988). Statistical techniques are then applied to detect the significance and discriminatory power of the financial ratios in preparation for building the SMEs prediction model. In what was almost the first study of SMEs default risk prediction, Altman and Sabato (2007) developed a one-year default risk prediction model using the logistic regression technique on a sample of over 2,000 firms from 1994 to 2002 in the US. Five of the seventeen financial ratios—in the categories of profitability, liquidity, leverage, activity, and coverage—were chosen as the most important predictors of SMEs failure: EBITDA (earnings before interest, taxes, depreciation, and amortization)/total assets, short term debt/equity book value, retained earnings/total assets, cash/total assets, and EBITDA/interest expense. The authors concluded that SMEs require separate treatment from large companies given that the banks impose credit risk strategies and credit risk management models and procedures specifically applied to the SME segment. New ventures also tend to place a high priority on financial planning because liquidity constraints might lead to failure (Kraus et al. 2010). Several studies followed the work of Altman and Sabato (2007) and attempted to predict SME failure. Overall, the results demonstrate the validity of financial indicators in firms’ default risk forecasting, but the results vary from country to country. Kovacova et al. (2019) found that each country prefers a different set of explanatory variables. Table 1 illustrates precisely how the financial indicators that predict firm failure vary by country (e.g. Abdullah et al. 2019; Castillo et al. 2018; Cultrera and Brédart 2016; Fuertes-Callén et al. 2022; Lin et al. 2012; Zhang and Xie 2023). And, in line with the study of Altman and Sabato (2007), it reveals that profitability, leverage, liquidity, and activity ratio are predictors of failure.

Table 1 Literature review of empirical studies of firm’s failure prediction by financial ratios

2.3 Age and size as predictors

Based on the resource-based view (RBV), the internal characteristics and distinct resources of firms generate a sustainable competitive advantage to improve survival probability (Barney 1991; Kücher et al. 2020). Age and size are found to be important determinants in SMEs and new ventures default prediction (Altman et al. 2010; Fuertes-Callén et al. 2022; Kücher et al. 2020). Studies have shown that the liability of newness—with the likelihood of firm failure declining with age—renders a high failure rate among new organizations (Carroll and Delacroix 1982; Stinchcombe 1965). Unlike established firms that have already reached viability, this liability of newness experienced by new ventures can significantly reduce their chances of survival in the absence of growth (Bruderl et al. 1992). Additionally, access to financial resources for SMEs is affected by age because banks and financial institutions consider older SMEs less risky than younger ones when deciding to offer loans. As SMEs age, their improved capabilities and experience create a more appealing profile and facilitate greater access to loans (Garcia-Martinez et al. 2023). The failure of younger firms is attributed to insufficient resources and capabilities (Thornhill and Amit 2003). However, as firms age, they learn to become more efficient, and the opportunities for survival and growth increase (Wennberg et al. 2016). This prediction has been supported by several studies for both SMEs and new ventures. For instance, Coad et al. (2016), who employed UK new ventures data, showed that survival predictions are expected to improve in the years following entry. Those ventures that survive obtain the resources to weather the trade fluctuations that characterize their early days. Similar results were obtained by Gregg and Parthasarathy (2017) whose analysis, based on data from eBay ventures, suggested that an auction firm's chances of remaining in business increase as it extends the number of days it has been in business. Moreover, from the perspective of the industry life cycle, Esteve-Pérez et al. (2018) demonstrated that age is one of the factors decreasing the hazard rate of companies in their intermediate organizational lives. Although SMEs are older and at a later stage of their organizational life cycle than new ventures, age is still found to be a significant factor in predicting default risk (Altman et al. 2010; Matenda et al. 2020).

According to the liability of smallness, the size of firms is inversely related to their failure rates (Freeman et al. 1983). Research on the impact of size on the failure of new ventures has experienced an upward trend. Size is the key source for innovation (e.g., R&D input and technology adoption), which crucially influences SMEs’ performance (Garcia-Martinez et al. 2023). When competing with larger firms, new and smaller-sized firms are more likely to use less debt to access external financing resources. Thus, they tend to lack sufficient financial capital to invest in advanced technologies and find it harder to survive (Garcia-Martinez et al. 2023; Honjo 2000). In short, these financing obstacles constrain their growth (Beck et al. 2006). This is consistent with the study by Bernanke and Gertler (1995), which concluded that bank financing and credit markets are less accessible to smaller firms than to large firms, with bankruptcy more likely to result. Moreover, large companies are more efficient in production due to their ability to use more specialized inputs, better coordination of resources, and scale economies. In addition to adequate financial capital, a greater number of qualified employees can help firms conquer the liability of smallness by increasing their efficiency and profitability (Moser et al. 2017). Starting out with greater numbers allows firms to benefit from their employees’ know-how, thereby boosting productivity and efficiency, and ultimately increasing their likelihood of survival (Fonseca et al. 2022). However, small firms have difficulty in achieving economies of scale, obtaining credit for investment, and securing qualified personnel because they lack adequate resources (Yang and Chen 2009). Therefore, in the marketplace, smaller-sized firms are at a competitive disadvantage and more likely to go out of business. To measure size based on financial and human resources, the extant literature has tended to utilize assets (Fuertes-Callén et al. 2022) and number of employees (Fonseca et al. 2022). The negative relationship between failure probability and new venture size has been tested by several studies, such as Mata and Portugal (1994) for Portugal, Audretsch and Mahmood (1995) for the US, and Resende et al. (2016) for Brazil. The probability of default for SMEs on the basis of size has been proven by Altman et al. (2010) and Acosta-González et al. (2019).

3 Methodology

In this section, we discuss the source and selection of the database, the choice of explanatory variables, the statistical methods of this analysis, and the methods employed to evaluate the models’ performance.

3.1 Data

Our sample consists of annual certified financial, performance, and survival information for SMEs and new ventures in Portugal obtained from the IES [Informação Empresarial Simplificada (IES)] form, available in the INFORMA D&B database between 2010 and 2018. The data cover all industries and are divided into two sub-samples of SMEs and new ventures separately. We include SMEs with less than 250 employees with annual revenue below €50 million based on the definition provided by the European Union, and new ventures established between 2010 and 2018. We excluded firms that had reported no activities and had suspended activities. The observations with negative assets, cash, or liabilities and zero employees were eliminated due to errors and inappropriate perceptions of firms’ performance and size (Fonseca et al. 2022). The firms in the financial and insurance sectors were excluded from the sample because of their different accounting and typically high leverage, which tends to signify distress for non-financial firms (Fama and French 1992; Gupta et al. 2015; Mselmi et al. 2017). Our final SME sample consists of 772,037 SME–year observations with 229,855 unique SMEs (13,545 of the SMEs failed with a failure rate of 5.89%). The new venture sample comprises 191,237 new venture–year observations with 101,645 unique new ventures (5,457 of the new ventures failed with a failure rate of 5.37%).

To validate the prediction performance of our models, the dataset was divided into two sub-periods: the estimation period (from 2010 to 2016) to develop the prediction models, and the out-of-sample period (from 2017 to 2018), also named as a hold-out sample, to validate the developed models’ prediction performance. Following Altman and Sabato (2007) and Gupta et al. (2014), the firm–year observations have been lagged by one period, utilizing data in the current period to predict the default probability of firms over the next time period. In other words, we predict the failure one year in advance.

3.2 Variables selection

3.2.1 Dependent variables

In our study, the dependent variable is failure coded as 1 if the firm fails. The failure of the firm is taken to mean that the firm is out of business in a firm–year reported by INFORMA D&B with the following outcomes: dissolution, extinction, insolvency, and legal closure. It equals 0 if none of those outcomes happen up to the end of the observation period (Patel et al. 2020).

3.2.2 Selection of predictor variables

Building on past literature predicting SMEs and new ventures default risk, in addition to our database, this study focuses on financial ratios as well as size and age, and compares SMEs and new ventures. We initially incorporated ratios that have previously been shown to be effective in forecasting the probability of default in five categories: liquidity and solvency, profitability, leverage, activity and efficiency, and age and size. Table 2 presents the list of predictors, their definitions, and sources.

Table 2 List of Variables

In the category of liquidity and solvency ratios, we use cash to assets ratio (CTA), working capital to assets ratio (WCTA), goodwill to assets ratio (IATA), and EBITDA-to-interest coverage ratio (EBITDAIE). A firm with better liquidity and solvency is less likely to default on financial obligations, and intangible assets are capitalized more aggressively by firms facing distress. Consequently, we expect these ratios to be inversely related to default (Gupta et al. 2015; Jones 2011).

In the profitability ratios category, we include return on assets (using EBITDA), return on sales (using net income), return on assets (using net income) and retained earnings to assets ratio (RETA). They are used to assess a company's capacity to generate profits. Healthier firms are expected to have higher values than distressed ones (Gupta et al. 2015). Consequently, negative relationships are expected between them and failure.

Regarding the category of leverage ratios, short-term D/E ratio (STDE), creditors to assets ratio (TCTA), debt ratio (TLTA), and financial expenses to assets ratio (FETA) are used. Financial leverage represents the amount of liabilities that require constant payments (Zhang et al. 2010). Thus, these ratios are used to measure the ability of firms to pay their liabilities. They are expected to have positive relationships with default because a firm associated with high liabilities has an increased risk of failure.

In respect of the activity and efficiency ratios, we select creditors to debtors ratio (TCTD), taxes to assets ratio (TTA), and profit per employee (PPE). They are supposed to have negative relationships with default since financial distress is indicated by low values in these ratios. Excessive trade debtors (also named accounts receivable) would result in insufficient cash flow and tight capital chains, which are harmful to a firm’s financial health. According to Hudson (1986), small firms tend to file for bankruptcy because of trade creditors, which is where most of their liabilities come from. Thus, the smaller the TCTD, the higher the hazard of default. Regarding TTA, the more profit a firm earns, the greater the amount of taxes paid. A firm is less likely to default on its tax obligation when it has a healthy liquidity position (Gupta et al. 2014). In addition, PPE stands for how much profit each employee generates over a given period of time (Lin et al. 2012). Firms benefit from a high profit per employee ratio because it indicates that their employees are well trained and productive.

Size is calculated as the natural logarithm of total assets. In line with Fonseca et al. (2022), we use the natural logarithm of the number of employees as the other size. Finally, age is defined as the natural logarithm of the current year minus the year of birth.

3.3 Statistical model applied

Beaver (1966) and Altman (1968) were pioneers in this field, using univariate analysis and multivariate models, respectively. In the majority of papers on modeling SMEs default prediction, researchers prefer to use traditional methodologies, such as discriminant analysis and logistic models (Ciampi et al. 2021). Altman (1968) was the first to apply multiple discriminant analysis (MDA) to develop the Z-score model based on five key financial ratios to detect bankruptcy potential in manufacturing corporates. MDA remained, for many years, the most widely used statistical method for default prediction models by numerous scholars. However, the limitation of MDA is that the two restrictive assumptions are often violated when applied to the default prediction: (a) the independent variables are multivariate and normally distributed; (b) the variance–covariance matrices of failed and non-failed groups are equal. Furthermore, the standardized coefficients in MDA models cannot be used to identify the relative importance of the various variables because they cannot be interpreted as the slopes of a regression equation (Altman and Sabato 2007). On the other hand, the logistic model is the most popular conditional probability method in forecasting corporate failure and does not require multivariate normal distribution variables or equal dispersion matrices when considering prior probabilities of failure or independent variables. Consequently, the logistic method is typically thought of as less demanding than MDA (Balcaen and Ooghe 2006; Ohlson 1980). Additionally, the hazard model has been prevalent since it was proposed by Shumway (2001). However, a discrete time hazard with a logit link is basically a panel logistic model controlling for the age of firms or incorporating macro-dependent baseline hazard (Gupta et al. 2018; Nam et al. 2008). Other methodologies, such as support vector machine techniques, artificial neural networks, and the Grabit model, have also been widely employed in modelling the default risk of SMEs (Ciampi and Gordini 2013; Mselmi et al. 2017; Sigrist and Hirnschall 2019). In our study, one-year default prediction models based on logistic techniques are employed.

Ohlson (1980) initially proposed a logistic regression in default prediction that does not require the two restrictive assumptions and that facilitates the use of disproportional samples. The dependent variable in our study is binary (failed/non-failed), which seems to be well matched with the logistic model. In past decades, logistic regression has been widely deployed in the default prediction literature (Altman et al. 2010; Altman and Sabato 2007; Fernando et al. 2019; Geulen et al. 2023; Gupta et al. 2014; Tinoco and Wilson 2013; Lin et al. 2012; Matenda et al. 2020). Therefore, in this study, we adopt logistic regression as our statistical technique and develop one-year default prediction models. The method calculates a score (probability) for each firm based on the explanatory ratios in order to derive a classification of either healthy or failed.

The function of the logistic regression in our study can be written as follows:

$$P\left({Y}_{it}=1|{X}_{it-1}\right)=\frac{1}{1+{e}^{-\beta {X}_{it-1}}}$$
(1)

where.

\(P\left({Y}_{it}=1|{X}_{it-1}\right)\) is the failure probability of firm i at time t;

\(\beta\) is the vector of the coefficients;

\({X}_{it-1}\) represents the vector of the explanatory ratios for firm i default probability at the end of the previous period.

In the current study, two multivariate logistic models are developed, one for SMEs and one for new ventures.

3.4 Performance evaluation

Following Altman et al. (2010), Altman and Sabato (2007), and Tinoco and Wilson (2013), we report the misclassification matrix and the receiver operating characteristic (ROC) curve to assess and validate the prediction performance of our empirical models. Calculating the percentage of results that have been correctly classified is a method of assessing how well a predictive model has performed. The correctly classified cases are the true positives (the firms are “failed”, and our model classifies them as expected failure) and true negatives (the firms are “non-failed”, and the model classifies them as expected non-failures). On the contrary, they are labeled false positives (Type I error, the proportion of failed firms classified as expected non-failures) and false negatives (Type II error, the percentage of non-failed firms classified as expected failures). The following procedures produce a misclassification matrix from which the percentage of correctly identified objects is obtained: (a) choosing a score cut-off, which, in most cases, equals the proportion of failed firms in our sample; (b) marking all results below the cut-off as expected failures—in our study, they are failed objects—and all those above as expected non-failed objects. (c) cross-tabulating the expected failures and non-failures against the actuals; (d) calculating the proportion of correctly classified failures and non-failures according to the model, and overall classification accuracy is calculated as 1 minus the average of the two type errors (Altman and Sabato 2007; Anderson 2007).

Originally from signal detection theory, the receiver operating characteristic (ROC) curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) as the threshold to discriminate between the changes of failed and non-failed firms. The area under the ROC curve (AUROC) provides a better, more comprehensive explanation of classification accuracy. An AUROC of 0.5 would mean that the model has no discrimination, while 1 would imply perfect model prediction performance. In general, performance is considered acceptable when the score is between 0.7 and 0.8, whereas excellent discrimination exists if the score is no less than 0.8 (Altman et al. 2010; Hosmer Jr et al. 2013).

4 Empirical results

In this section, the descriptive statistics analysis of the selected predictor variables is conducted first. Then, we present the univariate analysis of each potential variable and correlation matrix to determine the candidates in the following multivariate model. Next, the multivariate logistic models using stepwise selection for SME and new venture categories are estimated, respectively. Finally, we compare the results from the two categories, and validation is presented for the hold-out samples. In order to eliminate the outliers, all variables introduced into the empirical regression are winsorized at the top and bottom 2.5 percentiles.

4.1 Descriptive statistics

The mean values and standard deviations for all the potential variables of SME and new venture samples separating the failed and non-failed firms are reported in Table 3 with no extreme variability.

Table 3 Descriptive Statistics and Mean Difference t-Test

In the category of liquidity and solvency, for both SME and new venture samples, CTA has higher mean values in failed firm groups than non-failed groups, while the mean values of WCTA are lower in failed groups. The mean values of profitability ratios, namely ROA (using EBITDA), ROS (using net income), ROA (using net income) and RETA, are all lower in failed groups than non-failed counterparts except ROA (using EBITDA) in the SME sample. The mean value of TCTA in the leverage category is somewhat lower in the failed SME group but higher in the failed new venture group. TLTA has higher mean values in the failed firm group for both samples. Activity ratios, size, and age in both samples produce lower mean values in failed firm groups than in non-failed groups.

Moreover, we find from Table 3 that the majority of the variables present significant mean differences between the failed and non-failed groups at the significance level of 1% except IATA, STDE, and TCTA in the SME sample, and EBITDAIE and TCTA in the new venture sample.

4.2 Univariate analysis and correlation matrix

The univariate analysis is conducted in this section, prior to developing the multivariate models. To determine the discriminatory power of the explanatory variables, univariate analysis is generally recommended and utilized in the literature (Altman et al. 2010; El Kalak and Hudson 2016; Gupta et al. 2014). We also conduct a univariate analysis for each sample. By introducing in turn each covariate as an independent variable in the logistic model, the significant discriminatory power and sign of the coefficient of each covariate are determined in Table 4. The covariates exhibiting the expected signs and significant discriminatory power were selected for the correlation test to inspect whether there are any high correlations among them. We exclude covariates with lower values of Wald Chi2 from the univariate regression when the ratios in each group show high correlations because that expresses lower explanatory power (El Kalak and Hudson 2016).

Table 4 Univariate Analysis Results of SMEs and New Ventures Samples

Table 4 shows that, according to the coefficients of variables following the methods used by El Kalak and Hudson (2016) and Gupta et al. (2014), CTA, IATA, STDE, FETA, and TCTD do not show the expected signs in the SME sample. Similarly, in the new venture sample, CTA, IATA, EBITDAIE, STDE, TCTA, and FETA do not show the expected signs. Consequently, these variables are dropped, and the correlation test is applied to the remaining variables.

Table 5 presents the correlation matrices of the covariates. Some of the covariates need to be excluded from the following multivariate regression due to high collinearity. In the SME sample, among 13 covariates, WCTA is highly correlated with RETA and TLTA, and TLTA has strong correlations with four other covariates. In the profitability ratios, ROA (using EBITDA), ROS (using net income) and ROA (using net income) are highly correlated with each other, and a substantial degree of correlation is found between size (assets) and size (employees). Following El Kalak and Hudson (2016), when the two covariates are found to be highly correlated with each other, we retain the one with the higher Chi2 value from the univariate test. Thus, in the SME sample, EBITDAIE, ROA (using net income), RETA, TCTA, TTA, PPE, size (assets), and age are included in the multivariate analysis. By undertaking the same inspection in the new venture sample, ROA (using EBITDA), ROS (using net income), RETA, TCTD, TTA, size (assets), size (employees), and age are designated for the next analysis. The financial variables selected are in line with the previous work of El Kalak and Hudson (2016) and Gupta et al. (2015).

Table 5 Correlation Matrices

4.3 The development of the multivariate model

Table 6 presents the two multivariate failure prediction models estimated by the logistic regression technique for SMEs and new ventures, respectively. The dependent variable is binary (1 for failure and 0 for non-failure), and the stepwise selection technique with forward elimination is utilized to detect the optimal statistically significant set of covariates from the preceding univariate and correlation analysis under the 5% significance level. Possible multicollinearity is eliminated through stepwise selection, and the robust standard error option is used to cope with possible heteroscedasticity. Furthermore, to account for the effects of different years and industries (manufacturing and non-manufacturing) on the final results, we use year and industry dummies.

Table 6 Multivariate Logistic Regression for SMEs and New Ventures Samples

4.3.1 Failure prediction model for SMEs

As shown in Table 6, seven covariates, namely EBITDAIE, ROA (using net income), RETA, TCTA, TTA, size (assets), and age, are selected by stepwise regression to conduct the final failure prediction model for SMEs. One of the profitability ratios, RETA, is selected as the most discriminatory ratio one year prior to failure. RETA evaluates a firm’s accumulated profit over its total assets. An increase in RETA by one unit decreases the probability of SMEs failure by 0.858 (exp (− 0.153)) times. These seven covariates all exhibit highly significant discriminatory power and display the expected signs of their coefficients in the final model for SMEs. From Table 7 and Fig. 1, we find that the overall classification accuracy of our developed model for SMEs is 75.13% with AUROC at 0.8284.

Table 7 Misclassification rates and accuracy performance of the different models
Fig. 1
figure 1

The area under the ROC curve for SMEs estimation

4.3.2 Failure prediction model for new ventures

The same approach is applied to form the prediction model for new ventures. The eight covariates that remained from the univariate analysis are all selected through the stepwise procedure. Similarly, we find the most discriminatory ratio is one of the profitability ratios, but it is ROA (using EBITDA), the EBITDA to total assets, which indicates a firm's efficiency in generating profits (earnings before interest, taxes, depreciation, and amortization) from its assets. The probability of new ventures’ failure is reduced by 0.890 (exp (− 0.116)) times when ROA (using EBITDA) is increased by one unit. The final prediction model for new ventures is estimated by these eight highly significant variables with the expected signs of their coefficients. As shown in Table 7 and Fig. 2, the new venture estimation model reports an overall classification accuracy of 68.29%, which is lower than that of the SME model. Moreover, the area under the ROC curve in our study is 0.7468, which still represents a strong classification performance.

Fig. 2
figure 2

The area under the ROC curve for new ventures estimation

4.4 Validation

In order to measure the forecast ability of the developed multivariate models, we follow the most frequently and broadly used validation approach in the field of bankruptcy and financial distress prediction by introducing hold-out sample validation (Altman and Sabato 2007; Fernando et al. 2019; Gupta et al. 2015; Tinoco and Wilson 2013). We retain the data for the last two years (2017 to 2018) as the hold-out sample and apply our developed multivariate models to the validation period.

The error rates, overall classification accuracy, and area under the ROC curve for the SME sample are set out in Table 7 and Figs. 3 and 4. The failure rate (the proportion of failed firms) in each sample is used as the cut-off rate to calculate the error rates for each model (Altman and Sabato 2007). Then Type I and Type II error rates for each model are reported. The overall classification accuracy of each model is calculated as 1 minus the average of two error rates (Altman and Sabato 2007; Fuertes-Callén et al. 2022; Gupta et al. 2014), where the accuracy for SMEs is higher than that of new ventures.

Fig. 3
figure 3

The area under the ROC curve for SMEs validation

Fig. 4
figure 4

The area under the ROC curve for new ventures validation

We find that, in the SME hold-out sample, the predictive accuracy is slightly raised to 75.79% with a high AUROC of 0.8265 compared with that of the estimation sample. In the hold-out sample of new ventures, with the obvious decreases in error rates, both the classification accuracy and AUROC have improved to 73.00% and 0.8071. The better performance in validation was also reported by Tinoco and Wilson (2013). This may be accounted for by fewer observations of the hold-out samples and, hence, the lower number of events of failure that could be predicted. All the models show better predictive accuracy compared with some existing studies (Altman and Sabato 2007; Fuertes-Callén et al. 2022; Gupta et al. 2014), and present values of AUROC of at least 0.80, especially the SME models that exhibit more than 0.82. They, therefore, demonstrate robust predictive ability.

5 Discussion

In this study, we develop the failure prediction models for SMEs and new ventures by using financial ratios, age, and size. The final multivariate models perform well and provide several interesting findings.

In the SME model, seven covariates are selected to form the prediction model, which is in line with the findings of El Kalak and Hudson (2016), Gupta et al. (2015) and Zhang and Xie (2023). However, unlike the findings of Lin et al. (2012), profit per employee is eliminated in the process. The difference could be due to the different measurements of failure. Lin et al. (2012) adopted four groups of financial health whereas we used the ultimate organizational status—failure or not. This is also consistent with the fact that size (employees) is excluded in the correlation matrix test, implying that, in our case, SMEs as more established firms are less likely to rely on human resources. Instead, the financial resources measured by size (assets) play a more crucial role in the failure risk prediction. Compared with previous studies—for example, 74.41% overall classification accuracy in the SMEs default risk prediction of Altman and Sabato (2007), and 64.85% of Gupta et al. (2014)—our overall classification accuracy shows better prediction ability. In addition, the area under the ROC curve of our developed model is 0.8284, which represents good model performance on prediction and is better than that of Altman and Sabato (2007) and Gupta et al. (2014).

In the new venture model, eight covariates are included in the final prediction model. Three of them are profitability ratios, which is in line with the findings of Delmar et al. (2013) and Fuertes-Callén et al. (2022) who highlighted the importance of profitability ratios in the early stages of new ventures. Moreover, our findings show that both size measured by assets and employees are determinants in predicting new venture failure. This is consistent with the study of Fonseca et al. (2022). New ventures facing liabilities of smallness experience greater obstacles in the early stage of their organizational life cycle. With increased size, new ventures can benefit from economies of scale and scope, and develop production capacity in an efficient way to increase their survival chances (Fonseca et al. 2022). Compared with some extant studies in new venture survival prediction, such as Fuertes-Callén et al. (2022) who reported 65.9% overall classification accuracy by using financial ratios, our finding yields greater accuracy.

To compare the final prediction models for SMEs and new ventures, as shown in Table 6, retained earnings to assets ratio, taxes to assets ratio, size (assets), and age are all explanatory variables that significantly discriminate the hazard of failure for both SMEs and new ventures, and the size and age ratios both enjoy high ranking as discriminatory ratios in stepwise regression. Size measured by assets is always ranked highly among the selected ratios in the stepwise process for both types of firm, indicating the importance of financial resources in both SMEs and new ventures performance (Fonseca et al. 2022; Garcia-Martinez et al. 2023). In the SME sample, for one unit increase in size (assets), the probability of failure decreases by 0.593 (exp (− 0.522)) times, whereas a rise in size (assets) by one unit reduces the probability of new ventures failure by 0.640 (exp (− 0.446)) times (see Table 6), which indicates that new ventures are more sensitive to the effects caused by the liability of smallness (Freeman et al. 1983). Making the same calculation for the age ratio, we can readily grasp that the bankruptcy odds of new ventures are affected more by the increase in age, which aligns with the liability of newness (Carroll and Delacroix 1982; Stinchcombe 1965). Another interesting finding is the highest-ranked predictor for both SMEs and new ventures is the profitability ratio. This is in line with Gelashvili et al. (2022), who proposed that profitability is one of the most pivotal ratios for long-term survival of any type of firm. However, for SMEs, it is the retained earnings to assets ratio whereas, for new ventures, it is the return on assets (EBIDTA). This can be explained by the fact that the retained earnings to assets ratio is a cumulative profitability ratio. Compared with mature SMEs, new ventures have not had time to construct their cumulative profits (Altman 2013). Net worth increases if profits are retained by the firm and are not distributed. Thus, retained earnings measure the performance of firms both financially and operationally since their establishment and are used for failure prediction (Akerlof and Shiller 2010; Fuertes-Callén et al. 2022). However, EBITDA is a relatively short-term profitability indicator measuring the true productivity and earning power of the firm’s assets (Altman 2013). New ventures are younger firms experiencing liabilities of newness and smallness, who need to improve their current production capacity to sell greater quantities, generate revenues, and reduce their costs so that their chances of survival are enhanced (Fonseca et al. 2022).

Additionally, when comparing the overall classification accuracy, error rates, and AUROC of SMEs and new ventures one year prior to failure, the results indicate that the SMEs failure prediction model outperforms the new ventures counterpart. This is consistent with the findings of Fuertes-Callén et al. (2022) who contend that, even though the predictive capacity of new ventures’ financial ratios is lower than that of mature firms, it is nevertheless considerable. Consequently, we advise that the failure prediction models for SMEs and new ventures should be developed separately.

5.1 Theoretical implications

Although financial distress and default prediction have been investigated for a long time, this study has improved the existing literature from several perspectives. First, our contribution enriches the firm failure literature in corporate finance research. We have expanded firm failure prediction studies to a new geographic area (Portugal) and developed the geography-specific failure prediction models where SMEs and new ventures play crucial roles in boosting the national economy. Second, for the first time, we have examined financial factors, age, and size to predict the probability of failure for all non-financial unlisted SMEs and new ventures based on a near-census, large, and reliable database. Furthermore, we have compared the different determinants and predictive accuracy of risk of failure concerning these two types of firm, and have provided evidence that new ventures are different due to the liabilities of newness and smallness (Freeman et al. 1983; Stinchcombe 1965). Finally, we help disentangle some of the reasons why accounting information may not always receive adequate attention from investors until a firm becomes mature (Wright and Robbie 1996). Compared with SMEs that are more structured and enjoy longer survival times and greater economies of scale, new ventures may have only a limited amount of resources available for investment in innovation and marketing, which is not conducive to maintaining a competitive advantage and sustained performance (Jeng and Pak 2016). Thus, they may have financial problems at the beginning stage arising from poor sales and profits figures. Yet, some of them ultimately succeed because of their entrepreneurial orientation to innovativeness, risk taking, and proactiveness (Anwar et al. 2022; Fuertes-Callén et al. 2022).

5.2 Practical implications

Our study offers several practical implications. First, our findings may help different stakeholders, such as the lenders, financial institutions, entrepreneurs, and policy makers, to identify firms with high default risk and to improve the efficiency of their decision making. In Portugal, the majority of credit is granted by banks (Antunes et al. 2016). With more accurate firm failure prediction models provided, banks are better able to precisely evaluate the financial prospects of unlisted SMEs and new ventures. Consequently, the lenders’ risk of losses resulting from credit misallocation is lowered, and the funds in the lending portfolio are more fairly distributed to firms (Andrikopoulos and Khorasgani 2018). Moreover, our study provides evidence that, when assessing firms’ risk of default, new ventures need to be treated differently from SMEs. Indeed, the financial predictors can perform better when more non-financial factors are taken into account, but the predictive capacity of financial ratios is still considerable. Stakeholders are encouraged to use certified and transparent financial data to evaluate the financial well-being and operational stability of SMEs and new ventures. Furthermore, entrepreneurs and managers of new ventures need to pay greater attention to overcoming the liabilities of smallness and newness, and to improving the resources to achieve sustained development because new ventures are more sensitive to the effects of an increase in size and longer survival. Finally, we recognize that entrepreneurial capabilities are necessary to ensure firms’ long-term survival (Wiedeler and Kammerlander 2021). Thus, our findings provide policy makers with insights on implementing specific training programs for the entrepreneurs of SMEs and new ventures. The aim here is to improve entrepreneurs’ ability to recognize the importance of various financial ratios and the differences between them in order to adopt proactive measures and, consequently, reduce the risk of failure. By monitoring crucial ratios, policy makers are better placed to offer the targeted support that vulnerable firms need in different economic conditions.

5.3 Limitations and future research perspectives

This study has several limitations that can provide opportunities for further research. First, there is a lack of non-financial ratios to predict firm failure risk, especially for new ventures where both internal and external causes are likely to exercise influential roles (Klimas et al. 2021; Kücher et al. 2020). Future studies could examine the roles of non-financial ratios, such as innovation, entrepreneurial orientation, industry factors, founding conditions, outsider assistance, organizational structure and internationalization strategy (Anwar et al. 2022; Argente-Linares et al. 2013; Azeem and Khanna 2024; Garcia-Martinez et al. 2023). For instance, although we have controlled for industry fixed effects, the economic upheavals can significantly affect the performance of firms in the information and communication technology sector (Ogunrinde 2022). A more detailed taxonomy of industries can help improve the accuracy of failure risk prediction. Moreover, with technological development and Covid-19, digital entrepreneurship has been experiencing a rapid transformation (Czakon et al. 2022). Given the limitations of our database, we encourage future investigation of digitalization factors, and comparative analysis of pre- and post-Covid periods concerning failure prediction in future studies. Additionally, age has been found to be a significant failure predictor for both SMEs and new ventures in our study. As different causes of failure have been proposed that predominate at particular stages in an organization's lifespan as categorized by age quartiles (Kücher et al. 2020), future studies can further explore whether the likelihood of failure varies in different age quartiles, especially for new ventures. Finally, logistic techniques have been used in this study to form the multivariate default prediction models, which are among the most commonly used models. However, we encourage the testing of additional techniques, such as neural networks, decision trees, and data mining, to see if classification accuracy can be improved.

6 Conclusion

We have employed panel logistic regression to develop one-year failure prediction models for Portuguese unlisted SMEs and new ventures using an extensive database. From the empirical findings, all of our developed and validation models have manifested significant classification performance, with the SME prediction model performing even better. In our developed multivariate prediction models, profitability, TTA, size, and age ratios appear to be important predictors for both SMEs and new ventures. These findings are in line with prior empirical investigations—see, among others, Altman and Sabato 2007, El Kalak and Hudson 2016, Fuertes-Callén et al. 2022, Gupta et al. 2015, and Zhang and Xie (2023). Although more covariates were selected for the final new venture model, the SME prediction model performed better on the basis of the overall classification accuracy and performance validation, yielding indicators in more categories, while new ventures were more susceptible to the effects of increased size and age.

Overall, our findings not only provide investors and creditors with early warning signals of firm performance and risk of failure but also support the position that predicting the risk of default requires SMEs and new ventures to be treated separately given their different natures.