1 Introduction

The mineral nutrition of plants is based on the absorption of inorganic ions from the soil. Mineral nutrients are then transported and distributed to aerial organs such as leaves and fruits via dedicated transporters and/or ion channels. Nutrients play diverse and critical roles in maintaining plant growth and development (Welch and Shuman 1995; Merchant 2010). However, the availability of these essential ions fluctuates in time and space due to changing environmental conditions. To cope with this constraint, plants have a wide range of adaptive responses triggered by sensing systems that perceive the external availability of mineral nutrients in the soil. Plants are able to reprogram and adjust their metabolism, growth, and development to adapt and survive. Such reactions result in changes in the underlying physiological process that are portrayed among others, by variations in the electrical potential (EP) of the plant (Volkov and Ranatunga 2006). Indeed, electrical signaling is a universal biological process to transmit information in the life kingdom. In plants, a rapid electrical response is observed as a reaction to external stimuli, either biotic or abiotic. Therefore, the analysis of plant electrical signals, called electrophysiology, has strong potential for detecting changes in the plant state by identifying patterns that are related to the applied stimuli, namely nutrient deficiencies.

Monitoring plant health is a daily routine for growers and farmers to adjust their management and respond effectively and in a timely way to abiotic and biotic challenges, thus preventing crop loss and ensuring good quality production. Furthermore, climate changes are disrupting predictable natural events and growers are responding by closer monitoring of their growing ecosystems. For instance, nutrient availability is one of several factors that have a significant impact on plant fitness and consequently lead to losses in crop yield in both qualitative and quantitative aspects (Morgan and Connolly 2013; Kalaji et al. 2017). Changes in the plant appearance, such as the color and the shape of the leaves, can indicate that the plant is suffering from a nutrient deficiency. Visual symptoms of nutrient deficiencies vary depending on the required amount, their biological role, the mobility within the plant, etc. (de Bang et al. 2021). However, for farmers, diagnosis using visual inspection can, therefore, only occur once the stress is at an advanced stage. In addition, some nutrient deficiencies share common symptoms (Kumar et al. 2021). Hence, to optimize the crop production, early and automated detection of a nutrient deficiency is needed in everyday agricultural practice.

Advances in digital technology nowadays allow remote sensing in real-time for precision agriculture. Many sensors are today deployed in the field to measure environmental factors such as weather conditions, soil conditions, and insect populations, but sensors that directly target a plant’s physiological state are scarce. There is a need for sensors that can detect early indices of plant stress. Remote spectral imaging is a novel imaging method to monitor early visual signs of phenotypic traits of plants (Mishra et al. 2020). However, the appearance of these phenotypic traits is the result of physiological changes that are already happening in the plant in response to stress, thus providing less time to react. In this context, measuring the electrical signal on agricultural crops could represent a worthy alternative in plants since electrical signals are known to occur within a second at the cellular level (Tran et al. 2018; Bouteau et al. 2020) and within a few seconds at the whole-plant level (Mousavi et al. 2013). At the whole-plant level, the sum of the electrical signal from cells is recorded as electrical potential. Electrical signals are known to play a central role in numerous physiological processes and in systemic communication in plants (Fromm and Lautner 2007; Choi et al. 2016). It is the most efficient method for rapidly transferring information over long distances. These electrical potential variations have been confirmed in many agricultural plants such as cucumber, pea, soybeans, cabbage, wheat, apricot, and tomato. Several studies in the current literature propose approaches based on machine learning techniques to identify patterns in plant electrophysiology signals related to an application of different biotic (Simmi et al. 2020; Reissig et al. 2021) and abiotic (Souza et al. 2017; Pereira et al. 2018) stressors. Nonetheless, most of these studies were done in a controlled environment, isolated from the surroundings’ electrical noise. Recent advances in plant electrophysiology allow real-time measurement of plant’s electrical signal in regular greenhouse conditions i.e. outside a Faraday cage (Tran et al. 2019). Using a thin needle, the electrical potential can be monitored for several weeks without affecting plant functions (Tran et al. 2019). It follows that the electrophysiological state can be deduced and could provide decision support to growers.

Electrophysiological sensors have been shown to monitor instantaneously the electrical signal of plants responding to their environment (Najdenovska et al. 2021a; Tran and Camps 2021). With a pair of shielded electrodes inserted in the main stem, the electrical potential is monitored and recorded continuously at a high sample rate (500 Hz) via an electronics board isolated from surrounding noise. Moreover, based on such recordings, a workflow employing Gradient Boosted Tree (GBT) algorithms on local signal features, has also been proposed for classification of the state of tomato plants in the presence of contamination with spider mites (Najdenovska et al. 2021b). The resulting classification model was able to distinguish the stressed state (spider mites presence) from the normal state with an accuracy of 80%.

Because of its economic and nutritional aspects, tomato is the second most important vegetable crop next to potato cultivated for its fleshy fruit. It is grown in almost every country across the globe (FAOSTAT 2022). Therefore, the aim of this study was, by applying the proposed workflow for classification of electrophysiological signals, to evaluate the possibility of differentiating a tomato plant’s normal state from the stressed one caused by the lack of specific nutrients. Amongst micronutrients, manganese (Mn) and iron (Fe) were chosen since it shares common visual symptoms at early stage, i.e. chlorosis of intercostal area of young leaves leading to difficult diagnosis. For macronutrient, nitrogen (N) and calcium (Ca) were selected in this study. Indeed, N is the nutrient needed in greatest abundance by plants since it is utilized for protein synthesis which is the plant “backbone” therefore essential for plant growth. Concerning Ca, it affects the fruits aspect with blossom-end rot if calcium is lacking for several days which represents major concerns for growers. Each of these nutrients were analyzed separately. Additionally, by comparing the signal features that most evidently discriminate the plant state, eventual specificities in the information portrayed by the electrical plant response could be determined and identified to each of these deficits, thus allowing the identification of what sort of deficiency is occurring.

2 Materials and methods

2.1 Experimental site and design

The experiment was conducted during the 2019 growing season in a greenhouse at Agroscope research station (Conthey, Switzerland). The compartment floor area measures 370 m2 and was equipped with technology comparable to commercial greenhouses. Tomato plants (Solanum lycopersicum), variety Admiro (De Ruiter), grafted on Beaufort (De Ruiter) root stock were used in this study. This variety is specific for soilless cultivation with indetermined growth allowing to harvest tomato during the whole season. Once the two first trusses are harvested, growth stage can be considered as constant with 13/15 leaves and 7/8 trusses simultaneously at different maturation stage. Plants were grown in rockwool cubes, transplanted at the four-leaf stage, at the end of January 2019 on organic slabs composed of bark compost (35%), a peat substitute (30%), Coco peat (20%) and topsoil 15% (Substrate 127, Ricoter, CH), located on hanging and elevated gutters. Plants were cultivated with two-trusses with a planting density of 3.8 trusses per m2.

Fifteen independent tomato plants were used from April to September 2019 (4–9 months old, > 4 m high) for each four specific nutrient deficiency trials that were performed separately over the time resulting in 60 plants overall. For each trial, tomato plants were initially submitted to a period with full nutrient solution, considered a control period, followed by deprivation of specific nutrient until advanced stage of deficiency. Therefore, the duration of the trial varied between 2 weeks (for N deprivation) to 5 weeks (for Mn deprivation) depending on the tested deficiency. Irrigation water and nutrients were supplied with drippers operated by a valve. Manganese, iron, nitrogen, or calcium were specifically removed from nutrients premixed at the beginning of each deprivation experiment.

2.2 Electrophysiology

The electrical potential was recorded continuously throughout each deficiency trial lasting several days to weeks depending on the studied nutrient. Signal acquisition was monitored as previously described (Tran et al. 2019) with the multi-channel PhytlSigns device (Vivent SA, Switzerland). The electrical potential was measured with custom-made electrodes, which consist in coaxial cable (2.79 mm diameter) and the center conductor (silver coated copper filament diameter < 0.2 mm) wire that was inserted into the main stem. Electrodes are fabricated from 50-Ω impedance coaxial cable with an inner conductor of silver coated copper wire of diameter 0.5 mm. The outer conductor is a shielded copper braid with a waterproof jacket. Particular attention is paid to grounding throughout the instrument. The electrode is connected to a DC-coupled amplifier with appropriate filtering and noise cancellation followed by an analogue to digital signal converter and a data logger.

In order to obtain a stable signal, the electrode should be inserted in the conducting bundles; thus, recording was checked for 48 h following insertion and replaced if required. Once inserted and signal acquisition is stable, electrodes can be left during several months. The difference of electrical potential was measured between two electrodes placed between a higher part of the stem (active electrode) and a lower part of the stem (ground electrode). The recordings were stored at sampling rate of 500 Hz.

2.3 Dataset

The data acquired within each deprivation experiment were analyzed as four separate datasets while employing the same methodology described in the following sections. The study aims at discriminating the normal from stressed plant state caused by a deficit of a particular nutrient or, in other words, to model plant electrophysiological behavior in control versus strong-stress conditions. Each dataset included a particular part of recordings from 15 tomato plants representing 96 h duration, representing 1440 h of recorded data per experiment. The first 48 h of each individual plant represent the optimal growth condition with complete nutrition, i.e., before the switch off of one of the nutrients tested, whereas the remaining 48 h were selected from the period when the visual symptoms appeared, i.e., when the plants were visually deficient. Hence, equal data distribution of each state was used to build a classifier for distinguishing these states in the recorded plant electrophysiological response. For early detection analysis, each model was applied on the whole recorded signal from the plants forming the respective test set, starting from the beginning until the end of the nutrient deprivation. The approach taken was to build models to classify “control” conditions (full nutrient) vs “strong” nutrient deficiency (visual symptoms). The studied classification output was the prediction of control sate, thus accuracy above 60% were considered as “control”, those below 40% were considered as “stressed” and in between were considered as “early stressed” since it represents the transient state. The whole recording acquired during experiment were used as test dataset.

2.4 Data preprocessing

The models for identifying the presence of stress in the monitored plants were built using machine learning techniques applied to features extracted locally from the recorded data (Najdenovska et al. 2021b). To extract such features, the raw signal i.e., the measured electrical potential in mV over time from each chosen dataset underwent several preprocessing steps. Figure 1 shows a diagram representing the preprocessing pipeline.

Fig. 1
figure 1

Diagram presenting the flow of the pre-processing procedure transforming the plant electrophysiological signal for modelling

The initial step involved notch filtering at 50 and 100 Hz to eliminate potential noise from the electric power source. The second step involved generating the samples for the modeling through a windowing procedure (Najdenovska et al. 2021b). More precisely, in a step of 5 min, seven different windows of fixed size, 15 s, 30 s, 1 min, 2 min, 5 min, 10 min, and 30 min, were taken from the filtered signal. Then in each of these windows, 34 different features were calculated describing the characteristics of the respective raw signal in time or frequency domain. The information portrayed by these signal characteristics was used to build the predictive models. The calculated features enclose:

  • Main descriptive statistical time-domain measures, such as the minimum, maximum, and variance as well as the skewness describing the asymmetry of the signal, the kurtosis characterizing the signal distribution in terms of tails, and the interquartile range (IQR) describing the spread of the distribution (Chatterjee et al. 2015; Najdenovska et al. 2021b);

  • Advanced statistical time-domain measures such as the Hjorth mobility estimating the mean frequency of the power spectrum, Hjorth complexity representing the change in frequency or the signal’s bandwidth (Hjorth 1970), generalized Hurst exponent (GHE) giving a measure of signal’s long-term memory i.e., signal’s persistent behaviors (Nigmatullin et al. 2021); Shannon (Shannon 1948) and logarithmic wavelet package entropy (wentropy) quantifying the degree of signal’s randomness, root mean square (RMS) of the windowed signal together with the impulse, margin, shape and crest factor expressing the signal properties related to its peaks and amplitude (Caesarendra and Tjahjowidodo 2017; Ben Ali et al. 2018);

  • Statistical frequency characteristics, namely the frequency center and the root variance frequency, considered the first- and second-order moment of the Fourier spectrum, respectively, as well as the root mean square frequency (Caesarendra and Tjahjowidodo 2017).

  • Color noise: describing the similarity between the signal’s noise and color noise, expressed by the scalar product of their respective normalized half-length spectrum of frequencies (Najdenovska et al. 2021b). Five different color noises were studied, namely the white, blue, brown, pink, and purple noise, each represented by different power for a different frequency spectrum range. It represents the dynamics of the critical state of a dynamic system (Bak et al. 1987; Pereira et al. 2018).

  • Time–frequency domain characteristics portrayed by the wavelet decomposition of order 8 performed using multiresolution analysis based on maximum overlap discrete wavelet transform. Such decomposition impedes the process of subsampling and, therefore, enables a higher level of information (Ghaemi et al. 2019). The taken features are the minimum, maximum, and mean values of the performed wavelet decomposition at levels 1, 4, and 8.

All the features calculated within these seven windows represent a single sample of the data to model for a given plant. Hence, each sample contained 238 features in total. The concatenation of these samples resulted in a matrix where each row represented a feature vector. A min–max normalization was applied to each feature vector to compensate for eventual inter-plant variability. A relatively big spectrum of features calculated within different window lengths was explored to better understand the signal information and the temporal extent characterizing the presence of stress in the recorded signals. This addresses the lack of a priori knowledge in this field (Najdenovska et al. 2021b).

The described preprocessing procedure transformed the 96 h raw signal recording of each plant into a modeling dataset of 1 152 samples. Or, for each experiment the modeling dataset enclosed 17 280 samples in total, where exactly half of them were labelled as “normal” and the other half as “stressed”. To reduce an eventual tendency to overfitting of the models to be built for each experiment, the initial feature space of 238 elements was reduced to a subset containing only the mutually uncorrelated features (correlation < 95%, significance level < 5%) and that, at the same time, are correlated to the label vector (correlation > 1%). Moreover, additional elimination of features with values remaining constant over were also excluded.

2.5 Modelling of classifiers

The classification models for each experiment were built using the Gradient Boosted Trees (GBT) algorithm (Chen and Guestrin 2016) based on previous studies carried out on tomato (Najdenovska et al. 2021b). To do this, the dataset was separated into two parts: the learning set enclosing samples of 12 plants (representing 80% of the data) and the test-set enclosing the remaining 3 plants (20% of the data), which allowed to decrease the evaluation bias. The GBT parameters such as number of trees, learning rate and maximum depth were tuned using a grid search within the learning set. The tested values for each parameter were [100, 200, 300], [4, 7, 9] and [0.05, 0.1, 0.3], respectively. The tuning enclosed a cross-validation with 12 folds, each fold corresponding to the samples of one plant. The parameters that provided the highest accuracy were chosen. This accuracy will be further referred to as training accuracy. To evaluate the performance of the built classifiers, several measures were calculated on the test set, namely accuracy, precision, recall value, and specificity.

2.6 Features importance

The GBT algorithm provides as well the importance of each feature, which is assessed as the average of its contribution for both splitting and improving the gain with the respective split during the construction of the tree (Rifkin and Klautau 2004). For the purpose of real-time prediction that can be used as help tool for growers and in order to reduce processing time, the minimal features required to predict deficiency has been assessed. For this, the 10 most discriminative features used by the GBT algorithm were selected and analyzed to further explore the features proficient in identifying the stressed state and the related classification performance. More precisely, for each of the four datasets individually, the classification process was repeated using only that set of features. In the next step, the least important feature out of the n was eliminated, and a novel model was built on the remaining n-1. This procedure, namely classification step and most discriminative features, was repeated until the set enclosed only one feature representing the most discriminative one.

3 Results

3.1 Appearance of visual symptoms

Tomato as agricultural crops may be subject to nutritional disorders, and depending on the specific nutrient, the appearance of visual symptoms usually occurs several days or weeks later. The effects of specific nutrient deficiencies on tomato in soilless culture have been investigated. Figure 2 shows the sensors setup and the appearance of the first visual symptoms on the tomato plants. This visual inspection has been performed daily. Lack of N supply in fertigation typically results in leaves with a uniformly pale green to yellow color. Symptoms were seen in the older leaves 4 days after depletion. For Ca (blossom rot end) and Fe (interveinal chlorosis), the symptoms appeared 9 and 12 days later, respectively. The longest manifestation of a deficiency was visually observed with Mn almost 3 weeks after applying the deficiency. The young leaves showed chlorotic zones in the intercostal areas. Since availability of nutrients is crucial for plant growth, development and consequently crop yield, early assessment of nutrient imbalance is of strong interest for growers.

Fig. 2
figure 2

a Electrode inserted in the main stem of tomato plant (top) linked to the device (bottom). b Typical visual symptoms observed on tomato plants after specific depletion of manganese (Mn), iron (Fe), nitrogen (N) or calcium (Ca) in the fertigation system. The appearance of these symptoms differs depending on nutrient deficiency

3.2 Electrical signal patterns

Long-term variations of electrical potentials (EP) were monitored in response to four different nutrient deficiencies starting from full nutrient condition, switched to nutrient deficiency condition until the end of deprivation. During full nutrient conditions (Fig. 3, “Control state”), electrical potential displayed daily variations with a higher potential during daytime compared to night time as previously described (Tran et al. 2019). These variations were shown to be linked to nycthemeral rhythm with higher metabolism during day (Oyarce and Gurovich 2010; Ríos-Rojas et al. 2015). Depending on the applied nutrient depletion, the EP showed a modification in daily variations (Fig. 4). The baseline showed a hyperpolarization or reduction in level after shortages of Mn, Fe and Ca; whereas a depolarization or increase in level was observed in response to N depletion (Fig. 4a). Concerning the daily amplitudes, a significant diminution was observed for all nutrients tested except for Ca which showed no significant change.

Fig. 3
figure 3

Representative electrical signals acquired in hydroponic tomato plants in soilless culture grown in a greenhouse. Two typical days are shown in control (left) and stressed state (right) i.e. appearance of visual symptoms after specific depletion of a manganese (Mn), b iron (Fe), c nitrate (N) or d calcium (Ca) in the fertigation system. The signal is presented at 1 point per minute

Fig. 4
figure 4

Effect of specific depletion of manganese (Mn), iron (Fe), nitrogen (N) or calcium (Ca) in the fertigation system on (a) the baseline and (b) the amplitude of bioelectrical signals. Results are the difference between control and stressed states. Bars represent mean ± s.e.m, n ≥ 24

3.3 Model performance

To identify eventual patterns in plant electrical responses differentiating the normal from the stressed state triggered by the lack of a nutrient, classification of electrophysiological signals was applied on data acquired from tomato plants growing in soilless culture submitted to specific nutrient deficiencies, namely Mn, Fe, N or Ca. The performance of the respective classification models is summarized in Table 1. Overall, the Gradient Boosted Trees (GBT) models performed with F1-score higher than 79% for each nutrient deficiency. Among the datasets, the Fe stressor displayed the lowest model performance (79.7%) whereas the highest accuracy was obtained for deficit of N (92.9%). Lack of Mn and Ca in the fertigation solution resulted in accuracy of 85.0% and 81.1%, respectively.

Table 1 Performance summary of the classification models trained on the initial features set

Along with the accuracy, Table 1 also shows the training accuracy representing the average over the accuracies obtained for each fold for the chosen values of the GBT parameters, as well as the precision, recall and specificity for each of the trained models, respectively. The relatively high values of precision, recall and specificity portray the fact that each of the classifiers is able to predict both plant states, stressed and normal, in the same, unbiased, manner. The exceptions to this are the models built on the Mn dataset, where the very high recall values and relatively lower specificity indicate that the classifiers are more biased to predict the stressed state over the normal state.

To reduce an eventual tendency to overfitting, a correlation-based selection of features was undertaken for each dataset. The resulting sets included from 103 to 124 features, depending on the stressor (Table 1). The complete list of features enclosed for each dataset are given in Tables S1-S4 of the Supplementary Information.

3.4 Models built with the most discriminative features

To further explore the most discriminative features for the trained GBT models and the related classification performance, additional modelling was done on a feature set enclosing solely the most discriminative features. The reduction of the feature space led to a slight decrease in the accuracy, compared to the original models. These results are summarized in Fig. 5 for each deficit dataset. The figure show changes in the model performance as the number of discriminative features starting from a set of 10 and decreasing the number iteratively to a single feature. For all nutrients tested a significant reduction of features allowed a good classification prediction.

Fig. 5
figure 5

Classification performance with features sets enclosing up to the 10 most discriminative features used by the GBT algorithm for models trained with datasets obtained from tomatoes grown after specific depletion of a manganese (Mn), b iron (Fe), c nitrate (N) or d calcium (Ca)in the fertigation system. Grey bars and colored curves represent training and accuracy, respectively. Dashed lines represent model accuracy using initial features set for training (grey) and testing (coloured) for the respective models

In the case of Mn deficit, the classification performance starts to decrease when taking seven features with a considerable reduction for three or fewer features (Fig. 5a). Considering the models built on the dataset representing the Fe deficit, with only five features, the models still clearly recognize the stressed state (Fig. 5b). For the dataset representing the Ca deficit, model perform better for the identification of the stressed state when the set is including either three or four features, the discrimination of each class becomes approximately equal, but the overall performance is lower (Fig. 5c). Finally, the models built for discriminating the normal from the stressed state caused by the N deficit, the performance starts to decrease when the feature set includes five or fewer features (Fig. 5d).

The findings represented in Table 2 show that specific combination features are used to discriminate different deficits. These features represent different types of information and are extracted for different window lengths. One could observe that for different deficits, common features appeared, namely skewness for three different datasets. It is noteworthy that among the most discriminative features, the Generalized Hurst exponent (GHE) is represented in all datasets.

Table 2 Importance of the minimum features required for discriminating the stress caused by each nutrient deficiency

Altogether, the models built on reduced feature sets, enclosing several important features, provided highly accurate predictions. In fact, even though there is a slight decrease in accuracy compared with models built on the initial features sets, the training accuracy is, in general, closer to the accuracy when using these considerably reduced sets. With this a posteriori reduction of the feature space, the complexity of the model decreases as does the required computing time and could be therefore implemented for real-time prediction for growers.

3.5 Specificity of nutrient deficit identification

To assess the ability of the trained models to identify only the stressed state triggered by the related specific deficit versus the recognition of a stress caused by the other deficits, a crossed-comparison of each model have been applied to the test sets of the other three datasets.

The accuracy of each model evaluated on the four different test-sets representing a specific deficit, for both the initial set of features and the reduced set enclosing the most discriminative features, are given in Table 3 and S5. One observes that apart from the test-set of the related stressor, the models perform poorly in classifying the data representing the other deficits. However, even though the accuracy remain lower, the models built for one of the macronutrients, either N or Ca, show better results representing the deficit of the other macronutrient than the data related to the deficit of micronutrients, especially when using the large feature set. Similarly, the models built for the Fe deficit, better classify the Mn test-set data than those related to the macronutrient deficiencies. In contrast, the models built on the Mn deficit data present very weak classification performance for the test-set related to Fe deficit.

Table 3 Models’ performance evaluated on the four different test-sets representing a specific deficit

3.6 Evaluation of a potential early prediction

The next step was to investigate whether the developed models were able to detect an early stage of the respective nutrient deficiency prior to the appearance of the actual visual symptoms (Fig. 2). The studied classification output was the prediction of “control” sate, thus accuracy above 60% were considered as “control”, those below 40% were considered as “stressed” and the transient state between 40 and 60% were considered as “early stressed” (Fig. S1). The Fig. 6 summarize the temporal scheme of model prediction against visual symptoms appearance. Among all algorithms tested, the one built for N performed poorly as a tool for early detection of deprivation only identifying the deficit at the same time as the leaves turned a more yellow color (Figs. 2 and 6). In contrast, for the other nutrient deficiencies, the trained classification models were able to predict lack of nutrients in the fertigation system well before visual symptoms (Fig. 6). The best performance was observed for Ca with a latency of only less than 1 day to predict a deficiency whereas visual symptoms appeared after 9 days. For Fe and Mn, the algorithm determined a nutritional disorder after 3 and 4 days, respectively. In addition, the Mn trained model displayed the earliest detection by reducing the reaction time for growers by more than 2 weeks before visual symptoms appear. It has to be noted that, an adaptation and/or compensation mechanism can be observed for Mn and Fe (Fig. S1). Altogether, these results demonstrate that electrical potential recorded on tomatoes grown in a commercial greenhouse provides strong potential for the use of machine learning techniques for early detection of lack of nutrients.

Fig. 6
figure 6

General temporal scheme representing prediction carried out on the test dataset acquired from tomatoes grown with specific depletions of manganese (Mn), iron (Fe), nitrogen (N) or calcium (Ca) in the fertigation system. Blue arrows show the time of the model prediction, whereas red arrows show the first appearance of visual symptoms

4 Discussion

This study shows that plant electrophysiology signals can be used to identify a stressed state, in commercial tomato crops, related to lack of a specific nutrient. More precisely, for each of the analyzed nutrient deficiencies, the trained classification models built based on a previously proposed workflow (Najdenovska et al. 2021b), are able to identify the stressed state with an accuracy higher than 79% (Table 1).

4.1 Long-term memory temporal features importance

The trained classification models also allow the identification of a set of features as the most discriminative for each studied stressor. The diversity of the most discriminative features for each nutrient deficiency could be related to the differences in plant reactions to the lack of a specific nutrient. For some deficiencies, the GBT algorithms show more decisive patterns in the signal information from the temporal and, for some, in the frequency-related features. Nevertheless, GHE, being a temporal feature portraying the signal’s long-term memory, is present among the most discriminative features in all tested nutrient deficiencies, but with different weights (Table 2). For instance, calculated within windows of 30 min, it is remarkably predominant for distinguishing the stress related to Mn deficiency but, for the stress related to either N or Ca deficiency, its weight is considerably lower. Additionally, this temporal feature was previously shown to be important in drought stress (Tran et al. 2019) or spider mite infestation (Najdenovska et al. 2021b) using electrophysiological signals in tomatoes. This strongly suggest that GHE could represent a common feature related to plant stress in general. Moreover, reported findings reveal the importance of GHE also for the analysis and classification of human bioelectrical signals, such the electrocardiogram (Karegar et al. 2017) and the electroencephalogram (Lahmiri 2018; Hu et al. 2019). Additionally, the low-level wavelet-based decomposition of the signal, representing its fast variations, is important for identifying the stress caused by the deficit of both Fe and N. However, differences are observed in the length of the window portraying the discriminative information, suggesting further distinct plant reactions for deficiencies of different nutrients. For instance, the characteristic variations at high frequencies are represented by small intervals of 15 s when the stress is triggered by a N deficit, while in the case of Fe deficiency, they are portrayed within intervals from several minutes up to 30 min, which is the other explored temporal extreme. Literature evidence suggests that biological signals exhibit important non-stationary characteristics (Seven et al. 2013; Shen and Lin 2019). In addition to these features, the modelling of brown noise appears as the most important feature for identifying stress from a Ca deficit. This could further support the recently introduced assumption that different stressors trigger changes in the color noise of the signal (Souza et al. 2017; Pereira et al. 2018).

4.2 Minimum features required to predict

With a posteriori reduction of the feature space, the complexity of the model decreases as does the required computing time. However, a trade-off should be considered since very simple models are usually not able to capture the dominant trend within the data. Additionally, the lack of relevant information also results in poor performance and often the combination of features appears as more discriminative than the individual presence of the features. In this case, by taking only a relatively small set of features, namely several of the most discriminative features, which decreases by approximately 10 times the size of the initial feature space, the model performance remains relatively high (Fig. 4 and Table 2). It has to be noted that specificity reflecting the control state is reduced in all nutrient models tested (Table S5). Hence, the computational time and power processing required for the prediction of the plant state could be significantly decreased. Therefore, it engenders a high potential for deploying the respective predictive models in real-time scenarios for continuous crop monitoring once the sensor is settled on a plant for the whole season.

In addition to the findings revealing that different features can discriminate different deficits, analysis showed that each of the trained models is specific for identification of the stress caused by the related deficit (Table 3). Indeed, for proper crop growth, 18 nutrients are essential and play diverse, critical roles in different physiological processes. Each nutrient is targeted to a specific area for a specific function and, consequently, the respective perception and signaling-related events are also specific (Hänsch and Mendel 2009). It is noteworthy that, according to the trained models, the stress caused by the deficit of the micronutrients is different from the stress triggered by the lack of the considered macronutrients. The observed affinity between models built for N and Ca suggests a potential similarity in the electrical plant response to the deficit of either one of these macronutrients. Such findings could be associated with the fact that macronutrients are required in larger amounts by plants (Kirkby 2012). Calcium improves plant vigor, is implicated in cell wall formation and stabilization and therefore, plays a role in growth (Thor 2019). Nitrogen plays a fundamental role regarding its presence in nucleic acids (DNA), amino acid (proteins) and in chlorophyll; therefore, it is vital for photosynthesis (Xiong et al. 2018). Hence, both N and Ca are targeted to growing points such as apex (stem and root) or fruits that possibly share common signaling pathways encoded in electrical signal features. Considering the two tested micronutrients, this affinity-related behavior is only observed for the model trained from the Fe deficit data when classifying the stress caused by the Mn deficit, implying that the signal patterns of plant responses to the deficit of Fe are potentially enclosed in the response to the deficit of Mn. As divalent ions, Fe (López-Millán et al. 2013) and Mn (Schmidt et al. 2016; Andresen et al. 2018) act as an enzyme cofactor or as a metal with catalytic activity in biological clusters. Once taken from soil, they are assimilated into plant tissue via similar transporters and, furthermore, evidence shows that compensation mechanisms exist between various micronutrients (Zhang et al. 2019; Alejandro et al. 2020). Both ions are important in chlorophyll formation, hence visual symptoms appear as chlorosis in crops, pointing out the fact that visual diagnosis of deficiency symptoms serves only to guide the occurrence of possible nutrition-related problems.

4.3 Early detection

The approach used to build models was to classify control (full nutrient) and “strong” nutrient deficiency (visual symptoms) conditions. The acquisition of the signal was done on the same cohort of plants for each studied nutrient deficiency. Even though the data representing control conditions was not acquired in parallel, the model performance on the unseen dataset (test dataset) gave a high prediction rate for both control and “strong stress” portrayed by the specificity and the recall value, respectively (Table 1). In addition, the prediction with the trained models on data representing the period before visual symptoms, showed significant early detection of stress related to Mn, Ca and Fe deficiencies, whereas for N, the model predicted a stressed state at the same time as visually detectable stress (Fig. 5). Nitrogen, as a primary macronutrient, is required in the greatest amount among all nutrients and therefore, its deficiency is demonstrated faster, namely after 4 days in the tested conditions (Fig. 1). Because N is a mobile nutrient, it can be remobilized to areas of greater demand, typically from old leaves to new growth causing deficiency symptoms first on older leaves. In contrast, non-mobile nutrients such as Ca, Mn or Fe are fixed and cannot be relocated within the plant. Therefore, evidence of a deficiency typically appears on new growth (Gerloff 1987; Kalaji et al. 2014). Results suggest that early detection of non-mobile nutrients could be achieved with electrical signal modelling. Regarding the findings, it is noteworthy that an early transient “stress state” was predicted 3 and 4 days after shortage of Fe and Mn, respectively, was imposed. It was followed by an adaptation state (Fig. 5 and Fig. S1). Plants have developed sophisticated regulatory systems to ensure uptake of all essential nutrients. The response to nutrient starvation is guided by a complex signaling network and involves metabolic adjustments (Schachtman and Shin 2007). The most successful prediction was found for calcium deficiency with a model prediction less than 1 day after removal in nutrient solution (Fig. S1). Calcium is known to play fundamental role in biological process and particularly for tomato crop, lack of calcium leads to blossom-end rot if deprivation last several days (Tonetto de Freitas et al. 2014). This latter being of economic importance for growers since fruit are non-marketable therefore, early detection is important.

Future investigations should be done to confirm and consolidate these findings for an extended cohort of tomato plants. An evaluation on other crops and on other nutrients should also be completed to better understand plant nutrition. Additional studies could include recordings done prior to the appearance of the visual symptoms for eventual refinements of the distinction between different stress levels. Furthermore, datasets acquired for analyzing the deficit of an individual nutrient could be combined to model a classifier recognizing a stress related to any nutrient deficiency in general. The findings presented indicate that such analysis should include features from both the temporal and frequency domains calculated for different window lengths.

5 Conclusion

Overall, this study suggests a novel path to explore for achieving early and automated detection of nutrient deficiencies before visual symptoms would appear that could further lead to more optimal crop nutrition and, consequently, enhanced crop production in terms of both quality and quantity. A semi-invasive electrical signal sensor could represent a real-time monitoring system of the nutritional status of crops that could allow timely application of fertilizers to optimize for growth and yield at different periods of the plant’s life cycle and manage precision management.

Supplementary data are available online.