1 Introduction

Optimal fertilization in crop production has the potential to improve yields, nutrient use efficiencies, and consequently the profitability of agriculture (Schut and Giller 2020; Bado et al. 2018). Additionally, optimal fertilization reduces the amount of non-point pollution from fertilizer and manure, attributed to injudicious nutrient management (Tandy et al. 2021). In most countries, the optimum fertilizer dose and fertilization strategies are established by extension services, academic research, and fertilizer companies with the objective to maximize crop yield and profitability. These fertilizer recommendations (FR) are usually aided by plant (Izsaki 2009; Weetman and Wells 1990; Bell 2023) or soil diagnostics (van Heerwaarden 2022), using empirical algorithms that translate soil or plant nutrient concentrations into a desired nutrient dose in order to achieve the target crop yield. Both methods require insights into the relationship between concentrations in soils or plants and the corresponding growth and yield response curves, usually obtained in field experiments using different rates of fertilizers (Bell 2023). How soil- and plant-based methods will complement each other is currently under debate; however, sensor-derived estimates (of both) will certainly contribute to more tailor-made fertilizer practices that can match crop demand with the actual soil nutrient supply or availability.

Plant diagnostics use whole shoot or plant part analysis to assess deficiencies or toxicities in view of the nutrient requirements (Izsaki 2009; Weetman and Wells 1990) and have been used in various in-season FR for both macro and micro nutrients (Olfs et al. 2005). The use of plant diagnostics is based on the idea that the plant itself is the best indicator of deficiencies in the nutrient supply from the soil (Lemaire et al. 2019). Although the result of a plant analysis can be used to decide about the necessity, the ‘optimum’ timing, and the ‘optimum’ fertilizer dose, its value for FR systems has been challenged because nutrient interactions and physiological growth stage influence the critical level (Marchand et al. 2013; Martínez et al. 2002). The actual nutrient level in the plant is the result of the interaction between nutrient supply, water availability, growth stage, and possible stressors affecting crop nutrient uptake (Cadot et al. 2018). Hence, plant nutrient concentration alone is not an accurate proxy of nutrient requirement (Lemaire et al. 2019).

Soil diagnostics is by far the most common method to optimize crop fertilization (van Heerwaarden 2022). This soil-based approach includes sampling and laboratory analysis of bioavailable nutrients, thereby relating the crop yield response to the nutrient dose applied while accounting for soil nutrient supply (Eckert 1987; Jordan-Meille et al. 2012). The relationships underling FR are by definition empirical and only valid for the agroecosystem properties for which they have been derived, limiting their applicability for other regions and land uses (Lemaire et al. 2019). Critical soil nutrient levels have been defined for each extraction method (soil test) to ensure that crop growth is not limited assuming a desired target yield level (Breitschuh et al. 2008; Tunney et al. 1997). These levels are often defined as the soil nutrient level that corresponds to a crop yield that is 90% or 95% of the maximum yield; it is used as the soil nutrient level that distinguishes deficient from non-deficient soils (Cox 1992; Mortvedt 1977; Steinfurth et al. 2022; Conyers et al. 2013). Substantial variation in these critical levels has been found across regions, farming systems, and continents, being affected by soil sampling and laboratory procedures to measure the plant availability of nutrients (Mortvedt 1977; Ros et al. 2011; Jordan-Meille et al. 2012), the methodology to derive critical nutrient levels from experimental data, crop and soil management practices (Lemaire et al. 2019), and the agroecosystem properties controlling crop growth (Bell et al. 2013a; Conyers et al. 2020; Conyers et al. 2013; Lemaire et al. 2019). The latter properties include not only crop type and crop variety but also soil properties such as pH, texture, groundwater depth, and soil organic matter (Valkama et al. 2009; Schut and Giller 2020). This has been confirmed by recent data-driven approaches showing that the crop yield response to nutrient inputs is affected by various agroecosystem properties (Ros et al. 2016; Coulibali et al. 2020).

Measured soil nutrient levels via soil testing classifies the soil fertility status as being low, medium, high, or very high, following an empirical relationship between crop yield and nutrient levels, defined here as soil test calibration (Jordan-Meille et al. 2012; Mitchell and Huluka 2012a, b). Both crop yield as well the agronomic efficiency of fertilizer nutrient inputs depend on this soil fertility status, with high crop yield responses to nutrient additions expected in soils with low nutrient availability (Voss 1998) and low responses in soils with a high fertility status (Babu et al. 2016). As a result, soil tests enable farmers to enhance crop yield and improve the agronomic efficiency of fertilizers by optimizing nutrient additions. The determination of critical soil nutrient levels serves as a basis for the conventional agronomic build-up-and-maintenance approach (Locke and Hanson 1991; Zone et al. 2020). At low soil nutrient levels, nutrients are added in excess of crop nutrient removal at a given target yield, thus elevating the soil fertility level (build-up), while nutrient addition is equal to crop nutrient removal at an adequate soil nutrient level, thus sustaining this adequate soil fertility status over time (maintain). Apart from an appropriate assessment of the crop nutrient demand, based on a target crop yield and nutrient contents in harvested crop parts, the accuracy of FR systems thus depends on the correct derivation of these critical soil nutrient levels. However, substantial variation in those levels has been found across agroecosystems (Lemaire et al. 2019) and there is an urgent need for standardized approaches to derive critical soil nutrient levels. The current uncertainty on these critical nutrient levels is one of the reasons for inefficient use of fertilizers leading to environmental losses and associated costs (Abay et al. 2022; Conyers et al. 2013; van Heerwaarden 2022; Fryer et al. 2019). Given the fact that most fertilizer recommendations originate from field experiments from the 60’s and 70’s, one might further question their robustness for current agricultural conditions (Zhang et al. 2021). Taking into account the economic and environmental challenges for farmers to optimize crop yields, there is a clear need for improved and scientifically sound fertilizer recommendations (Conyers et al. 2013; Slaton et al. 2022; Steinfurth et al. 2022; Jordan-Meille et al. 2012).

A critical review of the derivation of critical soil nutrient levels, their link to soil fertility classes and the underlying agronomic concepts has rarely been made. This review aims to unravel the impacts of site conditions and methodological factors (see Fig. 1) on the accuracy of critical soil nutrients levels in order to evaluate the opportunities for improved soil based FR systems. We focus on phosphorus (P), potassium (K) and zinc (Zn) and extending these insights to boron (B), iron (Fe), manganese (Mn) and copper (Cu) where possible. We excluded nitrogen (N) since the required external input of N is far more controlled by the crop N demand as compared to the aforementioned nutrients, since biological processes (mainly N mineralization) rather than chemical process determine the soil N supply, usually being a minor part of the total N taken up by crops in global agriculture. The ranges derived in critical levels are compared with observed ranges in medium soil fertility status. The latter range is related to crop yields that equal approximately 80–95% of its maximum value, being linked to the cut-off point to derive critical soil nutrient levels (for details see Section 3.1).

Fig. 1
figure 1

Two experimental approaches to determine the soil phosphorus critical levels. A A pot experiment and B a field experiment with the similar experimental treatments.

In more detail, we compared the observed range in critical levels defining the median soil fertility class from FR systems with the observed range derived from peer reviewed pot and field experiments. We hypothesized that the variation in critical soil nutrient values declines when one accounts for the methodological conditions under which these values have been derived. If this hypothesis is true, then there is a high potential for a build-up-and-maintenance approach using generally applicable soil tests being independent of region (climate), land use (crop type), and soil type. We also evaluated the impacts of site conditions in terms of region and crop type to see if they affect the critical levels. In this article, we first review the methodological factors determining critical soil nutrient levels along with their impacts, advantages and disadvantages (Section 2). Next, we describe the critical soil nutrient levels for all common soil tests as being used in FR as well the experimental observations from literature and quantitatively assess the impact of site conditions and methodological aspects on these critical soil nutrient levels (Section 3). Furthermore, we evaluate the value of soil tests to improve the efficiency of fertilization and describe the opportunities to do so as well as the main bottlenecks (Section 3).

2 Methodological factors determining critical soil nutrient levels

Critical soil nutrient levels are by definition associated with a soil nutrient test used (Jordan-Meille et al. 2012) and their relationship with crop yield responses to fertilization. Important factors controlling this relationship include (i) soil sampling intensity, sampling depth, and sample preparation; (ii) soil extraction method; (iii) experimental type; and (iv) statistical approaches applied to link crop response to soil nutrient levels (Jordan-Meille et al. 2012; Colomb et al. 2007; Heckman et al. 2006).

2.1 Soil sampling and sample preparation

Although soil nutrient levels usually decline with soil depth, fertilizer trials linking crop nutrient responses to soil nutrient concentrations are usually limited to the top soil. The sampling depth varies from < 5 cm up to 60 cm or more, depending on nutrient, cropping system, and climate (Bell et al. 2013b; Speirs et al. 2013; Brennan and Bell 2013; Agegnehu et al. 2015; Dodd and Mallarino 2005; Holford and Doyle 1992; Cox and Barnes 2002). A soil depth of 0–30 cm is most often used, assuming that the majority of the nutrients are taken up from the top layer and ignoring the contribution of nutrient uptake from the subsoil (Conyers et al. 2020). The soil depth affects the relationship between soil nutrient levels and the crop nutrient response because soil management practices and fertilizer application strategies affect the distribution of nutrients over depth. For example, minimum tillage coupled with broadcast application might reduce the amount of available nutrients in the subsoil (Bell et al. 2013b). Others showed that broadcasting P and K fertilizers increased soil nutrient levels over the ploughing layer whereas deep banding application of fertilizers increased P and K in the deeper soil layer only (Yuan et al. 2020). Consequently, the derived critical soil nutrient levels are valid for the analyzed soil depth only, limiting the applicability of FR to situations that have comparable nutrient distributions over depth (Cox 1992).

Soil sample preparation and handling, including storage conditions, homogenization, drying, grinding, sieving, and moisture content (Savoy 2013; Slaton et al. 2022; Jordan-Meille et al. 2012), also influence the nutrient concentrations thereby affecting the relationships between (critical) soil nutrient levels and crop yield (Slaton et al. 2022; Barbagelata and Mallarino 2013). Consequently, aspects of soil preparation and processing, historical management with regards to tillage, and fertilizer application need to be considered when studies show variation in critical soil nutrient levels (Slaton et al. 2022).

2.2 Extraction methods

There is a large a number of soil tests available to extract plant available nutrients (Jordan-Meille et al. 2012; Mortvedt 1977; Srivastava et al. 2008; Csathó et al. 2002). Most of the soil tests use chemical extractants to determine the amount and nutrient species that are readily available or could become available for crop uptake throughout the growing season, also referred to as bioavailable nutrients (Nafiu et al. 2012) since only a small fraction of the total nutrients in soil is being available for crop uptake (Alva 1993). Conceptually, three interconnected soil nutrient pools can be distinguished including (i) the actual available nutrient pool in soil solution; (ii) a potentially available pool that can become available due to chemical and biological processes, often called the labile or reactive nutrient pool; and (iii) a non-available or fixed nutrient pool (Harmsen et al. 2005). The nutrients in solution are readily available for plant uptake whereas the concentration is controlled by mineralization and immobilization for nutrients that are mediated by soil biology and sorption, desorption and precipitation, and dissolution for nutrients whose concentration is dominated by soil chemical processes (Wang et al. 2004; Islam et al. 2017; Bilias and Barbayiannis 2019). Soil tests strongly vary in methodology, including factors such as ionic strength, molarity, pH, soil solution ratio, shaking time, and subsequent filtration or centrifugation. As a consequence, the observed nutrient concentration might reflect the actual concentration in soil solution (as being the case for 0.01M CaCl2 method (Sánchez‐Alcalá et al. 2014; Houba et al. 2000)) or exceeds the concentration in soil solution with several orders of magnitude (Neyroud and Lischer 2003). Extraction methods can be classified as single nutrient extractions like Olsen P (Olsen 1954) or as multi-nutrient extractions like Mehlich 3 (Mehlich 1984) and CaCl2 (Houba et al. 2000). Multi-nutrient soil extractants have proven advantageous due to practical, budgetary, and environmental reasons (Ussiri et al. 1998; Bortolon and Gianello 2012, 2010). More recently alternatives such as soil sensors using spectroscopy have emerged to potentially replace these soil extractant (Mohamed et al. 2018).

Regarding the major plant nutrients, most soil tests developed were focused on P and to a lesser extent for the cations such as K, Ca, and Mg and micronutrients such as Zn, Cu, Mn, Mo, and Fe (Table 1). The soil tests for P include extractions with 0.01M CaCl2 (Houba et al. 2000), ammonium lactate (Egnér et al. 1960), sodium bicarbonate (Olsen 1954), calcium acetate lactate (Schüller 1969), Mehlich 1, 2, 3 (Mehlich 1953; Mehlich 1978; Mehlich 1984), and Bray 1 and 2 (Bray and Kurtz 1945), with the lowest P concentrations found in the 0.01M CaCl2 method and the highest P concentrations found in the Bray 2 method (Wuenscher et al. 2015). For the cations K, Ca, and Mg, common extractants include ammonium acetate (Doll and Lucas 1973; Simard 1993) and multi-extractants such as Colwell (Colwell 1963; Colwell and Esdaile 1968) and Mehlich 1, 2, and 3 (Mehlich 1978; Mehlich 1984). Micronutrient concentrations have been estimated using the diethylenetriaminepentaacetic acid (DTPA) (Lindsay and Norvell 1978), ethylenediamine tetra-acetic acid (EDTA) (Trierweiler and Lindsay 1969), and also Mehlich 1, 2, 3 method. Methods for Boron include hot water (Berger and Truog 1939), CaCl2 (Houba et al. 2000), DTPA (Lindsay and Norvell 1978), and Mehlich extractions (Mehlich 1978; Mehlich 1984). The soil test selected has strong impacts on nutrient levels as illustrated by Steinfurth et al. (2021).

Table 1 Details on the soil extraction method, corresponding chemical extractants and an overview of the extractable nutrients.

There are no general guidelines to select the most appropriate soil test given agroecosystem properties such as land use, climate, soil texture, and soil acidity (Mashayekhi et al. 2014). In addition, practical/logistical aspects such as ease of method and available equipment also affect the choice of the soil tests to be applied. Logistically, DTPA is preferred over Mehlich 3, HCl, and EDTA simply due to the time needed to perform the analysis (Mortvedt 1977). In the case of P extractants (Wuenscher et al. 2015), water extraction and CaCl2 have been proposed for high-intensity farming systems given the fact that those soils are characterized by high available P levels in soil solution, whereas the PCACL2 concentration is often below detection limit in extensive agricultural and natural systems. In addition, the POLSEN test is due to its properties better applicable for calcareous soils whereas PBRAY is more suitable for acidic soils and Mehlich 3 works across a broad range of soils (Locke and Hanson 1991; Jordan-Meille et al. 2012; Neyroud and Lischer 2003; Fixen et al. 1990). More importantly, none of them has a mechanistic underpinned concept in relation to the processes controlling the actual nutrient supply or buffering, making their relationship with crop nutrient uptake and agronomic efficiency (the change in yield per kg of nutrient added) largely empirical.

To reduce the uncertainty in observed critical soil nutrient levels, it is necessary to standardize soil testing procedures across laboratories. This is partly tackled by various initiatives such as the Global Soil Partnership by the Food Agriculture Organization and Wageningen Evaluating Programmes for Analytical Laboratories (WEPAL) though the agronomic derivation of critical soil nutrient levels is lacking a standardized protocol. Note that the analysis protocol in the laboratory might also affect the soil nutrient level determined, as shown for the difference in P by methods using colorimetry or inductively coupled plasma analytical procedures (Heckman et al. 2006).

2.3 Experimental type

Critical soil nutrients levels have been derived from short-term manipulation experiments (pot or field or combined), long-term field experiments, and extensive monitoring datasets from agricultural fields (Mortvedt 1977; Ayodele and Agboola 1985). Soil amendment experiments include experiments where the soil is amended by fertilizer to attain a certain soil fertility status after which the crop response to nutrient inputs is determined. To adapt the soil fertility status, nutrients are either added (Cox 1996; Srivastava et al. 2006; Cox and Barnes 2002) or the soil nutrient levels are diluted by mixing the soil with quartz (Corrales et al. 2007; Brtnicky et al. 2021). A difference in crop yield response between the manipulated and unmanipulated soil can be used to quantify the soil nutrient supply. The advantage of this controlled experimental approach is that all factors controlling crop yield remain equal except for the soil nutrient level, such as soil type, microbial community structure, temperature, and water availability. Since these experiments are usually carried out as pot experiments, their direct applicability for field applicable fertilizer recommendations remains limited (Mortvedt 1977).

Field experiments can both be short-term (one growing season) or long-term (several years) and have the highest potential to reflect the actual conditions controlling the crop response to nutrient availability. Nevertheless, results from field experiments are still highly affected by local soil and weather conditions, complicating the upscaling to regional or county level. Similar aspects limit the applicability from long-term field experiments, though the impact of weather might be included in the time series of observed crop yield responses to nutrient availability. In the last case, averaged and more generally applicable relationships between (critical) soil nutrient concentrations and crop yield responses can be established, independent of the actual weather conditions of the growing season (Zhang et al. 2021), in particular when experiments are done for more than 10 years (Mortvedt 1977).

Lastly, critical soil nutrient levels can be derived from large monitoring datasets where crop yields and extractable soil nutrient contents are determined (Speirs et al. 2013; Anderson et al. 2013; Bell et al. 2013b). Their advantage is the use of real field data from many plots and the possibility to include (or assess) the impact of site-specific conditions such as water availability, soil management, fertilizer strategy, and crop rotation. This approach has recently gained attention by new data-driven approaches tested for precision farming purposes, but it is rarely implemented in any fertilizer recommendation system yet. As with all empirical methods, data pre-processing and analysis should follow transparent and scientifically sound procedures to enable robust application in real-world situation thereby avoiding issues like overfitting, heteroscedasticity, and multicollinearity (Conyers et al. 2020).

2.4 Statistical models

A critical soil nutrient level can be derived from any experiment where both crop yields and soil nutrient levels have been determined by fitting a mathematical model on those observations (Ayodele and Agboola 1985; Hauser 1973). The critical level is classically defined as the cut-off value above which the crop yield does not respond to added nutrient inputs, mostly set at 90–95% of the maximum yield. The crop response is commonly given as relative yield (RY) in view of the maximum yield observed (Bai et al. 2013). Liu et al. (2017) has defined RY as “the proportion between nutrient-limited yield and attainable yield with optimal fertilization.” The common RY cut-off values range between 80 and 99% (Fageria et al. 1997; Colomb et al. 2007; Agboola and Ayodele 1987) and only in exceptional cases the maximum yield has been used. Lowering the cut-off value also lowers the critical soil nutrient level. The wide range in cut-off values is partly related to the crop type with 95% cut-off values mostly being used for high value crops such as vegetables and a value of 80% for low value crops, especially in places where fertilizer use is limited, but the selection remains rather subjective and sometimes even arbitrary. The fact that the majority of studies fail to report the cut-off values as well the crop yield data supports the idea that none of the soil tests has been developed for generic purposes but rather for specific and regional applications.

Various mathematical models have been applied to relate the crop yield response to nutrient availability (Cox 1992; Hochmuth et al. 2011) and subsequently the critical soil nutrient level needed to achieve the desired yield. These include a (i) a linear model where the yield increases continually and linearly with a change in nutrient availability, often being applied when the range in soil nutrient levels is insufficient to achieve maximum yield; (ii) quadratic model where the yield increases in response to increased soil nutrient availability until a threshold is reached, beyond which the crop response begins to decline; (iii) linear-plateau model (Waugh et al. 1973) in which the response is approximated by a linear response upon an increase in soil nutrient levels until a point (shoulder point) where the yield stabilizes (plateau); and (iv) exponential Mitscherlich model where soil nutrient level is exponentially related to yield (Melsted and Peck 1977; Mitscherlich et al. 1913). In addition to these most common five models, some are also using clustering algorithms such as the Cate-Nelson method (Mallarino and Blackmer 1992; Cate and Nelson 1971), alternative exponential models (e.g., used in the Better Fertilizer Decisions for Cropping Systems in Australia projects) or arcsine–log models (Dyson and Conyers 2013). The calibration procedure to define the critical soil nutrient level varies from least squared differences to graphical methods (Nelsen and Anderson, 1977). Variation in the mathematical models certainly contribute to the observed range in critical soil nutrient levels (Perrin 1976) where the Mitscherlich model often results in higher critical values than the Cate-Nelson model (Colomb et al. 2007) and where the linear-plateau also gives higher values than curvilinear models (Perrin 1976). In addition, the Cate-Nelson often performs better than the Mitscherlich and quadratic models (Agboola and Ayodele 1987). There is currently no broad consensus on which model should be preferred as well as the conditions determining the selection of the most appropriate one. One advantage of the Cate-Nelson model is that the critical level is not determined by relative yield level as opposed to the Mitscherlich model (Colomb et al. 2007). Melsted and Peck (1977) highlight that the crop response to variation in immobile nutrients such as P, K, Ca, and Mg can best be explained by exponential or quadratic models whereas linear models are usually better for mobile nutrients such as N and B. Despite the advantages of the Cate-Nelson methods, it has failed in some cases to identify critical soil nutrient levels (Heckman et al. 2006). A broad statistical analysis of crop yields and soil nutrient levels available from various field experiments across multiple sites would be needed to assess which model is best (in terms of explained variation) to derive a critical level.

3 Comparison of ranges in critical soil nutrient levels derived from experiments and those in medium soil fertility classes

3.1 Overall approach

The critical soil nutrient levels that are used to distinguish soil fertility classes are based on cut-off values for the relative yield. An illustration of the relationship between relative yield and soil nutrient levels, distinguishing low, medium and high soil fertility classes is given in Fig. 2. In this example the critical level is set near a relative yield of ca 95% at the border of the medium and high soil fertility class, while the relative yield at the border between the low and medium class is set at 70%. In agronomy there are no fixed guidelines for the delineation of these classes, making the derivation prone to subjective and arbitrary choices of the researchers involved (Cate and Nelson 1971). For instance (Fageria et al. 1997) classified soil fertility classes as follows: very low (< 70% RY); low (70–95% RY); medium (95–100% RY); and high (> 100%) whereas (Fageria and Santos 2008) defined the medium class as 90–100% of RY. Ayodele and Agboola (1985) classified fertility classes as low for yield levels at 74%, medium at 94% and high at yield levels exceeding 97% of the maximum yield. On the other hand, Vieira et al. (2016) classified medium class at those critical levels where the yield varies between ca 70 and 90% of the maximum yield. Overall, the medium soil fertility class is mostly related to crop yields that fall within 90 to 95% of the maximum yield, sometimes being as low as 70%. This is well linked to the cut-off values used to derive critical levels.

Fig. 2
figure 2

An illustration of the relationship between relative yield and soil nutrient levels, distinguishing low, medium, and high soil fertility classes. In this example the critical level is set near a relative yield of 100% but this may vary in literature from 80–95%. This level is generally near the upper range of the medium fertility class.

We compared the range in published critical soil nutrient values for the “medium” soil fertility status with the specific critical values reported in peer reviewed field trials. We hypothesized that the ranges should be comparable, meaning that all observed critical soil nutrient values from field experiments should fall within the upper and lower critical value defining the “medium” soil fertility class. The critical soil nutrient level is thus assumed to coincide with the medium soil fertility class. We then assessed whether the range in soil nutrient levels in the “medium” soil fertility status can be reduced by accounting for differences in site conditions and methodology.

3.2 Data collection

We first searched the reported ranges in nutrient levels for different soil fertility classes, as being used in fertilizer recommendation systems, usually published in grey literature. Most of the studies distinguish three different soil nutrient classes being defined as “low,” “medium,” and “high” fertility. For some cases two addition classes were defined being “very low” and “very high. We focus our discussion on the critical soil nutrient level represented by the “medium” soil fertility status class. This is achieved by establishing the lower and upper values associated with a specific element and extraction method. The selected studies originated mainly from extension service providers, and university and agricultural departments. The studies were in some cases complemented with peer reviewed journal articles based on a google search using the keywords “fertilizer recommendations.” All selected studies should at least contain the critical nutrient levels defining the soil fertility status (being classified as very low, low, medium, high, and very high) and a description of the soil test done. Relevant studies and experiments were selected if the data consisted of a range that represented a soil fertility class for a specific element and specified extraction method. Where needed, units were converted to mg kg-1. A total number of 36 publications were selected covering 12 nutrients, 17 soil tests, and 1 to 5 soil fertility status classes. Table 2 provides a summarized overview of the range in critical soil nutrient values used to define the medium soil fertility class. A detailed overview of the critical thresholds defining all soil fertility classes per nutrient and extraction method has been included in the supplementary material, Table S1.

Table 2 Summary of the range in medium soil fertility classes and in critical level (the 90% interval) for each element for the given extraction methods.

A systematic search for critical soil nutrient levels in peer reviewed literature was additionally conducted by via SCOPUS on 7th September, 2020 using the following search parameters: “( TITLE-ABS-KEY ( crop OR plant OR maize OR zea AND mays OR corn OR rice OR oryza AND sativa OR wheat OR triticum AND aestivum ) ) AND ( ( ( response OR growth OR yield OR “ Nutrient Use efficiency” OR effect OR uptake OR “soil*test*correlat*” OR “soil* test* calibrati*” OR “soil*test*interpr*” OR “cate-nelson*” OR “ mistcherlich*” ) ) AND ( phosphorus OR p OR potassium OR k OR zinc OR zn OR boron OR b OR sulfur OR sulphur OR s OR iron OR fe OR manganese OR mn OR copper OR cu ) ) AND ( LIMIT-TO ( EXACTKEYWORD , “crop yield” ) ) AND ( LIMIT-TO ( LANGUAGE , “English” ) ) AND ( LIMIT-TO ( SRCTYPE , “j” ) ).”

A total number of 123 articles were selected. After excluding publications with incomplete data on soil nutrient levels and related crop response, only 61 were left for further analysis. For each of them, experiment details, site properties, crop type, soil nutrient level and response model used were retrieved. Each critical level was considered a data point, resulting in 448 data points (for details, see Table S2 in SI). A summary of the selected studies is given in Table 3. Again, all units were converted to mg kg-1 where needed.

Table 3 Summary of the analyzed results on critical levels from the systematic literature review.

Phosphorus was the most studied nutrient followed by K and Zn (Fig. 3), probably because P is a major nutrient whose availability is highly controlled by soil properties (in contrast to nitrogen which is driven by crop demand) while also being a nutrient often limiting crop production (Tandy et al. 2021). Micronutrients such as B, Mn, Mo, and Cu are less frequently studied despite their relevance in the magnitude and nutritional quality of yield, probably due to methodological issues (requiring high laboratory skills and accurate equipment) or simply due to the absence of appropriate micronutrient fertilizers. Nevertheless, micronutrient deficiencies are widespread limiting crop yield and nutritional quality for human intake (Mortvedt 1977; Berkhout et al. 2017; Kihara et al. 2020; Kihara et al. 2017; Graham and Welch 2000; Wakeel et al. 2018; Breure et al. 2023). Generally, cereals are the most studied crop with wheat being the most assessed for multiple elements (Fig. 3) followed by maize. This dominance for P and the limited number of crops being analyzed shows that the scientific basis for most of other nutrients in the fertilizer recommendation systems is rather limited. The limited experimental data on nutrients other than P limits the quantitative analysis of the impact of methodology and site conditions (see Section 3.2) to phosphorus only. Given the similarities in soil and crop uptake processes, we might assume that the conclusions for P are also valid for the other nutrients.

Fig. 3
figure 3

Overview of the proportion and total number of crops studied per nutrient.

3.3 Statistics used to evaluate factors controlling variation in critical levels

We evaluated the impact of both methodology and agroecology on the variation in critical soil nutrient levels including site conditions. These include crop type and location (with location being a surrogate for various climate and soil properties) and methodological factors such as soil extraction method (see Section 2.2), experimental type (see Section 2.3), and statistical model applied and cut-off value used (see Section 2.4). Data were checked for normality and log-transformed if needed. Critical soil nutrient levels were standardized to unit variance for each extraction method to allow a proper intercomparison. The standardized data was subsequently used for the statistical analysis. For visualization purposes, data is shown in original scale in the supplementary material. To reduce the number of options for each of the categorial variables, we combined options that had low number of observations (n < 5) and those where the group means were not statistically different (tested via a simple t-test).

Impact of location, crop type, extractant, experimental approach, statistical model, and cut-off value on critical values were analyzed for individual and combined effect on critical levels using multiple linear regression and ANOVA. We first performed a main factor analysis to evaluate the impact of each factor on the variation in critical soil nutrient levels using the explained variance and the RMSE. We additionally evaluated their combined impact via two-way and three-way interactions assuming that the observed variation in critical soil nutrient levels across studies can be explained by the aforementioned factors.

3.4 Variation in critical levels as compared to ranges in medium soil fertility class

Table 2 provides an overview of the range in critical soil nutrient values defining the medium soil fertility classes as compared to the range in critical soil nutrient values derived from field experimental data. A detailed overview of the critical soil nutrient values for all soil fertility classes is given in Table S1 while additional details on the variation in critical levels are available in the shared dataset in excel.

Table 2 shows that the critical soil nutrient values defining the medium soil fertility status varied from 7 to 100 mg P kg-1, from 35 to 280 mg K kg-1, from 0.25 to 5 mg Zn kg-1, from 0.26 to 2 mg B kg-1, from 0.1 to 2.59 mg Cu kg-1, from 2.5 to 11 mg Fe kg-1, and from 0.5 to 200 mg Mn kg-1 for the nutrients included in fertilizer recommendation systems. Critical soil nutrient levels from experimental data varied from 0.33 to 96 mg kg-1 for P, from 25 to 301 mg kg-1 for K, and from 0.17 to 14.1 mg kg-1 for Zn. We conclude that for most of the nutrients, the critical soil nutrient levels are within the range of those used to define the medium soil fertility status as being used in fertilizer recommendation systems (Table 2).

For phosphorus (Fig. 4), the observed range in critical soil nutrient values for the Bray 1 soil test (varying from 3.5 to 45 mg P kg-1) fits within the median soil fertility class ranging from 12 to 41 mg P kg-1). Similarly, for POLSEN the critical values ranged from 3.9 to 24 mg P kg-1 being similar to the critical values defining the fertility class (ranging between 7 and 25 mg P kg-1). Same results are found for PMEHLICH3 and PCOLWELL with only a few outliers in both cases. For potassium determined via extraction with ammonium acetate, the critical soil nutrient values ranged between 30 and 301 mg K kg-1, being mostly within the boundaries of the medium soil K fertility status (75 and 280 mg K kg-1). Similar findings were found for KMEHLICH1 with critical soil nutrient values varied from 30 to 129 mg K kg-1 and the median K fertility class being defined as soils with KMEHLICH1 varying between 47 and 90 mg K kg-1. While there is quite some agreement between critical soil nutrient levels derived from field experiments and the levels defining the soil fertility class, there is also a wide range in critical soil nutrient levels across the analyzed studies that warrants further exploration. This variation is at least dependent on the soil test analyzed, as shown by the small ranges found for POLSEN and PCOLWELL (from experimental data) and POLSEN, PMEHLICH1, PBRAY1, and PBRAY2 (from fertilizer recommendation systems). For potassium (Fig. 5) as well as the micronutrients (Fig. 6) was the observed range in experimental derived critical soil nutrient levels consistently smaller than those derived from fertilizer recommendation systems.

Fig. 4
figure 4

Comparison of critical levels for P across all crops for various extraction methods derived from experiments and those in medium soil fertility classes. Each point represents a critical level derived from an experiment. The extraction method is a Bray 1, b Bray 2, c Olsen, d Mehlich 1, e Mehlich 3, and f Colwell. The green dotted line indicates the lower threshold and the red line indicates the upper threshold of the medium soil fertility class.

Fig. 5
figure 5

Comparison of critical levels for K across all crops for various extraction methods derived from experiments and those in medium soil fertility classes. The extraction method is a ammonium acetate, b Mehlich 1, and c Mehlich 3. The green dotted line indicates the lower threshold and the red line indicates the upper threshold of the medium soil fertility class.

Fig. 6
figure 6

Comparison of critical levels for a Zn, b Cu, c Fe, and d Mn across all crops based on DTPA extraction method derived from experiments and those in medium soil fertility classes. The green dotted line indicates the lower threshold and the red line indicates the upper threshold of the medium soil fertility class.

Our analysis confirms that the critical value underpinning FR systems matches those derived from experimental data, in particular for P, K, and Zn. However, we also note that this is partly due to the high variation observed in the critical soil nutrient levels defining the lower and upper boundary of the medium soil fertility status class (Figs. 4, 5, and 6). This agrees with earlier studies done for single nutrients like phosphorus (Jordan-Meille et al. 2012) and has been explained by variation in crop type, climate, laboratory procedures, and soil properties (Jordan-Meille et al. 2012; Colomb et al. 2007).

3.5 Site conditions and methodological aspects affecting critical levels

The variation in critical soil nutrient values was huge due to variation in site conditions and methodology. The following factors were analyzed: crop type, location, experimental approach, and statistical models.

3.5.1 Crop type

The impact of crop type on the variation in critical soil P levels are given in Fig. 7. Using critical levels standardized by extraction method, wheat has generally higher critical levels as compared to maize and soybean while maize has lower critical levels. The absolute critical levels for crops are reported shown in supplementary material Figure S1. The critical levels for barley, soybean, clover, and wheat were not significantly different while these levels significantly varied between canola, maize, cotton, lupin, peanuts, rice, sorghum, and sunflower. Overall, the crop factor was significant in determining the critical level. Differentiating the P recommendation per crop type is therefore logic in order to sustain crop development for crops varying in rooting density and their ability to take up P from soil. Compared to other site conditions, crop type could explain 5% of the variation in critical limits. Highest critical soil P levels when standardized by extraction methods are required for barley and wheat and lowest for maize and rice. The observed differences in critical soil P levels among crops is not only associated with differences in rooting systems but also in the ability of crops to deal with water and nutrient stresses during the growing season (Brouder and Volenec 2008). Burak et al. (2021) found that barley had longer roots than maize; however, the maize had wider roots. Crops with more intensive rooting systems allow therefore lower critical soil nutrient levels than crops where the nutrient uptake is limited by soil diffusion. However, the critical level is not only dependent on the root volume but other factors including crop demand.

Fig. 7
figure 7

Impact of crop type on critical soil P levels, standardized for the various extraction methods. Critical levels are standardized to unit variance per extraction method.

3.5.2 Location

About 29% of the variation in the critical soil P levels can be explained by the location (p < 0.001). The experiments done were located in various countries including Australia, USA, Brazil, Canada, China, Ethiopia, France, Hungarian, Kenya, Morocco, Nigeria, Tanzania, Madagascar, and Vietnam, confirming the variation in agroecological conditions related to these countries, such as rainfall, temperature, and soil properties. For instance, Feiziasl et al. (2009) highlight that a reduction in precipitation and temperature increases the critical nutrient level where others Conyers et al. (2020) showed that soil type and planting date also affects the critical level. If correct, this would imply that the derived critical soil nutrient levels are very specific for the regions where they have been determined. Since detailed soil and climatic variables for the experiments are unknown, we could not confirm or deny this conclusion. Other data-driven machine learning models have shown their ability to relate the variation in agronomic efficiency and nutrient use efficiency to site properties and management (Coulibali et al. 2020; Kirchmann et al. 2020; Qin et al. 2018), suggesting that a more generic critical soil nutrient level can be determined. All data-driven statistical approaches are by definition limited to the range in site conditions for which the models have been calibrated. Nevertheless, recent innovations in precision farming technologies suggest that by smart combination of sensor-derived estimates of soil properties and crop yield measurements on field and farm level can lead to tailor-made and efficient fertilizer recommendations (Guerrero et al. 2021; Maleki et al. 2007; Zhang et al. 2018; Ros et al. 2021).

3.5.3 Experimental approach

Seven experimental approaches have been used to related crop yield responses to soil nutrient availability, including large dataset based on field experiments (LDF, n = 207); long-term amended field experiments (LTAF, n = 12); long-term field experiments (LTF, n = 42); short-term amended field experiments (STAF, n = 4); short-term field experiments (STF, n = 112); short-term pot experiments (STP, n = 47); and short-term pot and field experiments (STPF, n = 24). The experimental approach explains 24% of the variation in critical soil nutrient levels for P (p < 0.05). Combined with the crop type, however, it did not explain much additional variation in soil critical nutrient levels, suggesting that these factors were partly correlated. On average, the STF approach gave higher critical soil nutrient levels than the other approaches (Fig. 8) and the absolute critical levels are shown in supplementary material Figure S2. Overall, field experiments had relatively higher critical values (Fig. 8) compared to the pot experiments contrary to the findings of Ayodele and Agboola (1985). However, Mortvedt (1977) argues that achieving 90% relative yield in a pot versus in the field may lead to different nutrient requirements, with less nutrients needed to reach 90% in the pot than in the field. The field conditions imply that the greater a plant’s potential for growth, the higher the minimum soil test level required to support its growth. Furthermore, interactive factors that might have affected the trial period such as climatic factors, management thereby altering the comparison of critical levels between the pot and field experiments. At the same time, it is also possible that there are usually more loses in the field than in a controlled environment thus leading to a higher critical level. Based on our results, amended field experiments (soil fertility classes) are a more suitable approach to determine the critical level than short-term experiments.

Fig. 8
figure 8

Impact of experimental approach on the critical level for a P, b K, and c Zn across all crops, standardized for the various extraction methods. LDF: large dataset based on field experiments; LTAF: long-term amended field experiments; LTF: long-term field experiments; STAF: short-term amended field experiments; STF: short-term field experiments; STP: short-term pot experiments; STPF: short-term pot and field experiments.

3.5.4 Statistical models

As expected, the statistical model used has substantial impact on the critical soil nutrient level derived from field experimental data. About 20% of the variation in the critical soil nutrient levels could be explained by the model applied. Figure 9 highlights the differences observed among critical soil nutrient levels for P, K, and Zn being derived by various statistical models used and the absolute critical levels are shown in supplementary material Figure S3. For instance, for P, the mean (and standardized to unit variance) critical soil nutrient level declined in the order Mitscherlich, exponential, quadratic, linear, alcc and Cate-Nelson (Fig. 9a). A similar trend was observed for potassium (Fig. 9b). Differences in critical soil nutrient values due to Cate-Nelson, exponential, and Mitscherlich models were therefore significant (p < 0.05). These findings are similar to earlier conclusions derived from Colomb et al. (2007) and Perrin (1976). For example, Colomb et al. (2007) found that Mitscherlich (exponential) models resulted in critical soil nutrient levels being 1.3 to 1.8 times higher than levels being derived from Cate-Nelson models. In addition, linear plateau models have been suggested to result in lower fertilizer recommendations (and hence higher critical soil nutrient levels) than the curvilinear models when applied to the same dataset (Perrin 1976). In three locations for soybean and maize, the exponential model led to higher critical levels followed by the quadratic plateau and linear plateau respectively (Dodd and Mallarino 2005). In addition, nonlinear models are often preferred due to the underlying processes controlling crop development and studies show that their explained variance outcompetes the linear models (Alivelu et al. 2003; Cox 1992). Recent studies are promoting the use of the Cate-Nelson model it looks at higher yields in positive quadrants thus aligns with the law of optimum better reflects the actual situation in the field than the historical law of the minimum (Lemaire et al. 2019).

Fig. 9
figure 9

Impact of the model used on the standardized critical levels of a P, b K, and c Zn across all crops, standardized for the various extraction methods.

Except for the Cate-Nelson model, all models use a specified cut-off values for the desired yield response. The cut-off point has substantial influence on the critical level across all elements, extraction method, and crop and explained about 14% of the variation in the critical soil nutrient levels. In any case, the relative yield depends on the agronomic intensity of the production system and the choice of the RY also depends on the acceptable economic risk level (Bell et al. 2013c). For instance, 90% of 10 tons/ha yield target and 90% of 5 ton/ha target are different but cannot be differentiated by looking at the relative yield.

3.6 Disentangling the impacts of site conditions and methodological aspects on critical levels

The ranges of threshold soil nutrient levels defining the boundaries of the medium soil fertility class as well as the range in critical soil nutrient levels observed in field experimental data was huge. This limits their applicability in FR systems outside the situation for which the critical levels have been derived. We analyzed the contribution of site conditions approximated by the location of the experiment and crop type and methodological factors being the soil test, the experimental approach, the statistical model used, and the cut-off value used. Combined site conditions and methodological aspects explained 51% of the variation in critical soil phosphorus levels observed in a wide range of experiments (Fig. 10a). However, the methodological aspects alone also explained 51% of the variation (Fig. 10b) while site conditions alone explained 30% of the variation (Fig. 10c). The contribution of individual factors explaining the variation in critical P limits was 29% for location, 5% for crop type, 24% for experimental approach, 20% for statistical model, and 14% for cut-off point and methodological factors emerged as the main driver of variations in critical levels, surpassing the influence of site conditions. This is highlighted by the fact that incorporating site conditions did not notably enhance explained variation once methodological aspects were considered (see Fig. 10a, b). Even though location exerted the single most significant impact on the observed variations in critical soil P levels, it did not substantially contribute when difference in methodological aspect were considered.

Fig. 10
figure 10

The ac explained variance and df standard error of P in the models. V0 = accounts for no factors; V1 = location; V2 = location + soil extractant; V3 = location + soil extractant + experimental approach; V4 = location + soil extractant + experimental approach + statistical model; V5 = location + soil extractant + experimental approach + statistical model + crop; V6 = location + soil extractant + experimental approach + statistical model + crop + cut-off point. M0 = accounts for no factors; M1 = soil extractant ; M2 = soil extractant + experimental approach ; M3 = soil extractant + experimental approach + statistical model ; M4= soil extractant + experimental approach + statistical model + cut-off point. S0 = accounts for no factors; S1= crop; S2= location + crop.

Considering both the methodological aspects and site conditions, the mean standard error on prediction reduced by 53%, from ca 0.74 to ca 0.35 (Fig. 10d), but a similar reduction was found when considering the methodological aspects alone (Fig. 10e), whereas the site conditions alone reduced the mean standard error by 18% only, from ca 0.74 to 0.61 (Figure 10f). This results implies that correcting for methodological aspects can cause a potential reduction of ca 50% in the range of critical values. Imputation of this reduction on the range in critical P levels, as given in Table 2, implies that it would quite strongly reduce the range (= uncertainty) for the critical soil nutrient levels defining the median soil fertility class as being used in fertilizer recommendations. Reducing the range around the mean by 50% implies that the range of PBRAY1 is reduced from 9.3–40 to 17–32 mg P kg-1, while PBRAY2 is reduced from 12–56 to 23–45 mg P kg-1, POLSEN from 4.9–21 to 8–17 mg P kg-1, PCOLWELL from 15–54 to 25–44 mg P kg-1, PMEHLICH1 from 13–58 to 24–47 mg P kg-1, and PMEHLICH3 from 16–55 to 26–45 mg P kg-1. A more accurate measure of the critical soil P level will evidently improve the nutrient use efficiency and avoid unnecessary build-up of P stocks in the soil and associated P losses via leaching, runoff, and erosion.

Although other attributes such as soil properties including the soil organic matter, clay content could have further reduced the critical level ranges (based on expectations from (Wuenscher et al. 2015)), limited data availability did not allow a more detailed assessment. We hypothesized that location could be used as a surrogate for site conditions, including soil properties and climate, but our analysis shows that it is highly correlated with methodological aspects. Without original field data of agro-ecological site conditions for the different experiments, it is impossible to disentangle their impact from methodological aspects affecting the derived crop nutrient response to nutrient availability. It is likely that even a stronger decline in the uncertainty of the estimated critical P levels is possible when the exact site properties controlling the crop response are quantified. Supporting evidence can be deduced from recent initiatives where machine learning algorithms are trained to explain the crop response to variation in soil nutrient levels while accounting for agroecological site conditions such as weather, soil quality, and crop management measures (Timsina et al. 2021; Jayashree et al. 2022). Therefore, establishing a minimum dataset requirement for soil test correlation and calibration studies as proposed by Slaton et al. (2022) and Conyers et al. (2013) would be a step towards improving fertilizer decision-making using evidence based information.

4 Conclusions and outlook

This study confirmed that the range in critical soil nutrient levels is comparable to the range in medium soil fertility classes used in fertilizer recommendation systems. This range is large. Our study thus aimed to unravel factors influencing the derivation of critical levels in order to reduce this high uncertainty. Strong variation in observed critical values originate from methodological aspects and site conditions, both explaining 51% of the variation in the critical levels found for phosphorus. The geographical location explained most of the variation in critical P levels followed by experimental approach, extraction method, the statistical model used, its cut-off value to assess yield level, and crop type. The uncertainty in the critical soil P level declined with a similar percentage when accounting for methodological aspects and site conditions, showing that there is potential to develop fertilizer recommendation systems with more robust estimates (more limited ranges) for the critical nutrient level above which further fertilizer increase hardly increase the crop yield. A similar reduction in the range in critical levels might be expected for the other nutrients (K, Mg, B, Mn, Mo, Cu, and Zn) but the low availability of experimental data limited the approach to derive more robust critical soil nutrient values.

This review highlights that there is a clear potential to reduce the uncertainty (in particular the observed range) in critical soil nutrient levels by correcting differences in methodological aspects. Their impact on the variation in critical nutrient levels was much bigger than the impact of location. This confirms our hypothesis that the variation in critical soil nutrient values declines when one accounts for the methodological conditions under which these values have been derived. Our results for phosphorus indicate that a reduction of 50% in the uncertainty of critical levels is possible by harmonizing methodological aspects, implying that more generic and broadly applicable soil (P) tests are possible when such harmonization is practiced. We assumed that location proxied site conditions, such as climate and soil properties, but unfortunately location appears to be entangled with methodology in our study. Consequently, the impact of variation in site conditions on critical soil nutrient levels could not well be derived.

As long as a correction for methodology is not implemented, the current FR systems remains limited to the conditions for which the critical soil nutrient levels have been determined. For that reason, Lemaire et al. (2019) proposed an alternative FR system using plant diagnostics to optimize fertilization practices in view of crop demand. In addition, when site conditions affect the actual plant availability of nutrients or the risk for nutrient deficiency, a sustainable fertilizer recommendation system might differentiate per nutrient for those factors. In that way one accounts for the site conditions controlling the variation in critical soil nutrient levels required for optimum crop yield. For example, the soil P status in the German FR system is differently evaluated for six soil texture classes and two land use categories. On the long term, we see potential for generic soil tests with narrow ranges defining the medium soil fertility class. There are currently two potential approaches to improve the reliability of soil-based FR systems: (i) statistical data driven approaches where the crop response is predicted in view of all site conditions (Chlingaryan et al. 2018; Radočaj et al. 2022; Barbedo 2019) and (ii) replacement of the empirical selected soil tests by soil test methods that reflect the mechanistic processes in soil controlling the plant availability (van Doorn et al. 2023).

The use of standard data formats for documenting experiments and modelling crop yield responses to nutrient inputs will certainly facilitate the exchange of information and the correct derivation of critical soil nutrient levels (Slaton et al. 2022). Furthermore, more attention for the interaction between nutrients, including interactions between macro and micronutrients is needed, considering that micronutrients are often limiting yields in great areas of Africa (Rietra et al. 2017; Kihara et al. 2017; Berkhout et al. 2017; Kihara et al. 2020). The focus of one element when deriving relationships between crop yields and soil nutrient levels ignores those nutrient interactions, which are relevant in influencing crop response. Though establishing a reliable database will take time and require multi-stakeholder collaboration (Lyons et al. 2021), we foresee high potential for more generic fertilizer recommendation systems that make use of reliable data, more process-based interpretation of nutrient pools and accounting for the interactions among nutrients as well as site conditions controlling the actual plant availability of nutrients (Lemaire et al. 2019).