Introduction

Archaeological pulses have been the object of little archaeobotanical research, in part due to the secondary role they played with regard to cereals in traditional agro-food cycles. This dearth of research likewise stems without a doubt from the difficulty of identifying archaeological seeds by the criteria of shape or size. The difficulty of their taxonomic identification may have been an additional reason. The problem of identifying pulse seeds is not limited to archaeobotanical research (Butler 1990; Sarpaki and Jones 1990; Jones 1992; Sarpaki 1992; Tarongi et al. 2020) as it also concerns present-day seeds (Franco Jubete 1991; Kirkbride et al. 2003; De Ron 2015). Because pulses exhibit a large intra-specific variability and a low inter-specific diversity, sometimes the taxonomic identification below the genera level is challenging, while at other times some identifications made have proven to be unreliable, as in the case of Lathyrus cicera and sativus (grass pea), currently classified as Lathyrus cicera/sativus (Mahler-Slasky and Kislev 2010).

Traditional morphometrics data have occasionally served to support taxonomical identifications in archaeological pulse seeds (Buxó 1991, 1997).

However, this type of analysis has not proven to be very effective, precisely due to the variability of their sizes. It is for this reason that seed shape has served as the main guideline of identification (Jones 1992; Buxó 1997; Alonso i Martínez 1999; Rovira Buendía 2007; Filatova et al. 2019). However, as this is carried out optically it is prone to subjectivity, which can lead to potential errors or inconsistencies between authors among similar species. This is the case of certain species of Lathyrus or Vicia and the genera of Vicia, Lathyrus, and Pisum.

In identifications by shape, the application of geometric morphometry has been very effective in determining archaeological seeds within the same species as in the cases of Vitis - grape (Bouby et al. 2018, 2021), Hordeum - barley (Ros et al. 2014; Wallace et al. 2019), Olea - olive (Newton et al. 2006; Terral et al. 2021), and Phoenix - date palm (Terral et al. 2012; Sallon et al. 2020). We have previously shown its efficacy on archaeological pulses (Tarongi et al. 2021), which suited a larger-scale analysis, applying this technique to identify seeds of archaeological pulses whose species was not clear. Modern morphometry obtains good species-level predictions for both modern and archaeological seeds, and may even go below species levels, even in legumes (Tarongi et al. 2021).

In this paper we present the results of applying predictive models obtained from traditional and modern morphometry data from different sets of cultivated legume seeds, both contemporary and archaeological without taxonomic uncertainty, to seeds of this family with unclear identifications. All the archaeological individuals are from Bronze and Iron Age (second-first millennium bc) sites in the Western Mediterranean. The cultivated legume species identified in this geo-chronological framework are Lathyrus cicera and L. sativus, Lens culinaris (lentil), Pisum sativum (pea), Vicia ervilia (bitter vetch), V. faba (broad bean), and V. sativa (vetch) (Fig. 1). Cicer arietinum (chickpea) has also been identified, but with very few individuals, so it has been discarded. This study is thus centred on these six contemporary and archaeological taxa.

Fig. 1
figure 1

Archaeological species examined in this study. Lathyrus cicera/sativus, Lens culinaris, Vicia ervilia and V. faba are from the site of Turó de la Font de la Canya (Avinyonet del Penedés, Barcelona); Pisum sativum is from Eras del Alcázar (Úbeda, Jaén.) and V. sativa from Bastida de les Alcusses (Moixent, Valencia). Examples of the biometric measurements serving for the study are detailed for Lathyrus cicera/sativus (upper left); scale bars = 1 mm

The correct identification of these species can help us to interpret the role of leguminous plants throughout the Bronze and Iron Ages of the Western Mediterranean. It is in this chronological framework that concentrations of these species appear for the first time in this area. The analyses thus serve to determine whether certain species native to this area were cultivated or not. On the other hand, the remains of certain allochthonous species are very scarce and similar to other species, and could have been incorrectly identified. This type of analysis can thus shed light on these questions and assist in determining whether certain species of rain-fed leguminous plants were cultivated since the Neolithic in very small proportions or whether their cultivation only really took place from the Iron Age onwards.

Materials and methods

Contemporary and archaeological materials

The contemporary pulse seeds used to generate the predictive models were obtained from different institutions and seed enterprises. These correspond to 1,130 individuals of six cultivated legume species (ESM 1.1), with two different varieties of Lens culinaris, which have been analysed through geometric morphometry, representing a total of 7 taxa. We have used current varieties that are as similar as possible in shape and size to the archaeological ones.

These seeds were charred at different temperatures, for different periods of time, and under different atmospheres. 879 of them were recovered, while 251 were discarded since they were too heavily damaged during this process. With this protocol we wanted to reflect the different processes by which the archaeological seeds could be charred, covering more faithfully the variability of deformations observed. The recovered seeds were then once again subjected to geometric morphometry analyses so as to obtain two different data sets, one in their original state and one once carbonized.

The archaeological seeds came from 23 different Bronze and Iron Age sites located throughout the Western Mediterranean (Fig. 2). They were separated into two groups, notably 6,883 clearly identified individuals from 21 sites (ESM 1.2) and 389 of uncertain identification from 11 sites (ESM 1.3). Each were then subjected to geometric morphometric analyses and each group was differentiated from the other in order to only apply the subsequent predictive models to the doubtful cases.

Fig. 2
figure 2

Map indicating the location of the Bronze and Iron Age sites yielding the pulses analysed in this study

Data acquisition

We followed Tarongi et al. (2021). For all seeds, two orthogonal photographs were taken. For modern material this was done before and after charring. The ventral view was defined as that where the radicle and the hilum, or its scar, occupy the centre of the image, orienting the radicle towards the upper part of the image, placing the thread or its scar at the top. The lateral view is the shape visible when rotating each seed 90º to the right from the ventral view (Fig. 1). Each seed was placed manually and photographed individually (Fig. 3A).

Fig. 3
figure 3

Graphic description of the steps followed to obtain the data. A: Taking the photograph; B: Removal of the background with the application Photoshop®; C: Landmark selection by means of the Image J application; D: Conversion of the photograph into a black and white mask; E: Coordinates obtained through the Momocs package in R; F: Transformation into coefficients by means of Fourier ellipses

Traditional and modern morphometric

Traditional morphometric measurements were obtained from the photographs previously scaled with ImageJ v.1.53 (Schneider et al. 2012). Length was taken from the ventral view, corresponding to the Feret diameter (‘calliper length’) of the longitudinal axis. The width corresponds to the Feret diameter of the perpendicular axis from the same view. Finally, the thickness corresponding to the Feret diameter of the longitudinal axis corresponds to the lateral view (Fig. 1). To focus on relative changes, all measurements were log-transformed.

Outline analyses were used to turn seed geometry into a quantitative description of shape. First all photographs were processed individually with Photoshop®, eliminating the background (Fig. 3B). Subsequently, two landmarks were set on the outlines using ImageJ. For the ventral view they correspond to the suture of the two cotyledons of the seed; for the lateral view they equate to the upper and lower ends of the seed (Fig. 3C). Finally, the images were converted to black and white masks (Fig. 3D).

All morphometric and statistical analyses were then made in the R environment using the Momocs package (Bonhomme et al. 2014; R Core Team 2021). Outline (x, y) coordinates were extracted (Fig. 3E) and then normalized for position, rotation, scale and the first point of the outlines using the two landmarks.

Elliptical Fourier transforms (EFT) were then calculated. EFT “describe” the outline as a harmonic sum of trigonometric functions whose coefficients were then used as quantitative variables (Fig. 3F). Each harmonic is associated with four coefficients (two for the x-coordinates and two for the y-coordinates). In this study we used eight harmonics, since they represented 99% of the harmonic power. We thus obtained 64 morphometric coefficients (32 per view).

Statistical analysis

We carried out a methodological test following the recommendations and methodology proposed by other authors in similar works (Evin et al. 2020; Jesus et al. 2021). We also calculated the Total Measurement Error (TME) that was generated by the human factor and by the use of different equipment during the taking of the photographs and adding of the landmarks.

To reduce dimensionality in modern and traditional morphometry variables, a principal component analysis (PCA) was performed on all seed sets. Then Linear Discriminant Analyses (LDA; one for the traditional variables and one for the modern ones) were obtained in each set using the PCA predictors that accumulate up to 90% of the variability. All LDAs were trained on each of the three seed sets, evaluated on the same material using Leave-one-out cross-validation, and then applied to the other materials. Accuracy is the proportion (here presented as percentages) of correctly identified taxa. Thus in analysis with six categories, individuals can be classified with a probability of 17% randomly. We have presented the data as both raw LDA inferences and also after filtering out seeds with a posterior probability < = 0.9.

Results

Testing for errors in the positioning of the seeds and their landmarks

This test has been carried out in the same set of seeds (5 Lens culinaris, 10 Vicia faba and 5 Pisum sativum) in 4 sessions spaced in time and performed in two different laboratories by the same person. The results obtained vary depending on the species and view (ESM 2). Error due to the human factor is more prevalent than error between different laboratories. The biggest margin was between the two sessions in Montpellier on the ventral view of Pisum sativum (TME 36.9%). The rest of the sessions and species had a result below 28% TME, especially Vicia faba, the TME of which was 11.6 to 3.9. On the other hand, the highest values were recorded during the first two sessions, when the technician had no experience in this kind of data collection, improving in the following sessions. In the general comparisons between the four sessions and the two laboratories, the MANOVAs show no significant statistical differences.

Principal component and linear discriminant analysis of the three predictive models

Size variables

The first 3 principal components gathered 100% of the total variance in the PCA performed on the size variables (traditional morphometry) in the three sets of predictor seeds (Fig. 4) and were further used as synthetic shape variables.

Fig. 4
figure 4

Top: Graphic representation of the first two principal components of the three predictive models. Bottom: Graphic representation of the first two categories of the linear discriminant analysis

The results of cross validation of the LDA on the size variables of the seed sets shows the predictive potential of the linear models. All three obtained more than 80% of correct predictions. With a posterior probability < = 0.9, correct predictions are halved, except for uncharred seeds (Fig. 5, more detail in ESM 3).

Fig. 5
figure 5

Predictive effectiveness of each linear model within its own training data. PP = Posterior Probability

Shape variables

The first 6 principal components gathered more than 90% of the total variance in the PCA performed on the shape variables (modern morphometry) in the three sets of predictor seeds (Fig. 4) and were further used as synthetic shape variables. Figure 4 shows that charring scatters taxa, confirming a greater difficulty in the identification of this type of remains.

The results of cross validation of the LDA on the shape variables of the three sets of seeds show better accuracies, with more than 93% correctly identified on the tree sets. With a posterior probability < = 0.9, the discriminant of the shapes of charred and archaeological seeds is more than 25% superior to that of size, especially in the archaeological ones (Fig. 5, more detail in ESM 3). Therefore, predictive models based on shape are superior to those based on size.

Testing the three models against each other

The correct predictions made on the size variables fall within a range of 54.3–19.5%, with two intermediate values of 45% and 40.5% (Fig. 6). These results are less than half, so each prediction is more likely to be wrong than right.

Fig. 6
figure 6

Predictive effectiveness of each linear model applied to sets of seeds of the taxa analysed in this paper of taxa. The calculation is based on individuals not serving as training data for these models

The correct predictions made on the shape variables fall within a range of 89.9–68.4%, with two intermediate values of 83.9% and 81.45% (Fig. 6).

The results show the greater predictive capacity of the shape models over the size models. The first ones exceed 80% correct predictions except in one case. In the second ones, only one model gets 50% of predictions correct (more detail in ESM 4). It is for this reason that only the shape models were applied to the doubtful archaeological seeds. Furthermore, this test reveals that the model based on present-day charred seeds is more effective when applied to the archaeological seeds than the model from non-charred seeds, a logical deduction as both sets are charred. This finding is of interest to future research as it reveals that it is better to turn to carbonised training data rather than uncharred data.

Predicting the taxa of the uncertain archaeological legume seeds

As no predictive model is exact, this study applied the three models based on shape variables of the three studied assemblages to the set of uncertain archaeological seeds. This led to three different potential predictions for each.

Moreover, the study discarded the predictions where no model coincided with another due to the impossibility of a correct geometric morphometry identification. This concerned 23 cases out of the 389 doubtful archaeological seeds (5.9%) (Fig. 7) as they were most often poorly preserved or deformed.

Fig. 7
figure 7

Identification percentages obtained from applying the models on archaeological seeds of doubtful taxonomy from the different sites

Two predictive models coincided in 167 archaeological seeds with uncertain identifications, while the third identified a different taxon. We decided to consider the identifications with two matching predictive models as possibly correct.

Finally, all three models predicted the same taxon among 199 of the 389 uncertain archaeological seeds. As in the previous case, these identifications were treated as possibly correct. Thus, the independent models among 51.1% of the doubtful cases predicted the same taxon and in 42.9% two models coincided yielding a possibly correct identification of 94.1%.

The site of Baume Layrou (Trèves, Gard) yielded the most doubtful cases (144), previously defined as possible Lathyrus. Out of 142 of them, two or three models have agreed on their identification, determining 123 P. sativum, 16 V. sativa, 2 V. faba, and 1 Lathyrus.

There are 88 uncertain archaeologically legume seeds at Lattara (Lattes, Hérault), and in 78 of them, two or three models have agreed on their prediction. In stratigraphic unit (SU) 27565, morphometrics have classified 15 seeds as P. sativum, 5 as V. sativa, 1 V. faba, 1 V. ervilia, and L. culinaris. In EU 50001 17 P. sativum, 7 V. sativa, 1 V. faba, 1 V. ervilia, and 1 undetermined seed have been identified. In SU 53030 there are 7 P. sativum and 2 Lathyrus seeds, in SU 54766 two P. sativum and 2 Lathyrus seeds, and in SU 54782 5 Lathyrus seeds, 2 P. sativum, and 3 undetermined seeds. The remaining EUs have a few individuals.

At the site of La Monédière (Bessan, Hérault), out of the 66 doubtful cases, 64 have generated identifications with 2 or 3 coinciding models. In the 9 SUs, 46 V. ervilia seeds, 11 P. sativum, and 6 Lathyrus seeds have been determined.

Out of the 44 doubtful seeds from the site of Plana del Castell (Cerdanyola del Vallès, Barcelona), all from the same sample, 25 have been identified as V. sativa, 10 as P. sativum, 1 as L. culinaris, and 1 as V. faba; 7 were undetermined. Another 16 poorly preserved and somewhat deformed seeds from Sant Martí d’Empúries (L’Escala, Girona) have been identified as V. sativa (7), P. sativum (4), Lathyrus (2), and one V. ervilia. In SU 179 at the site of Kelin/Villares (Caudete de las Fuentes, Valencia), 7 V. ervilia seeds, 4 Lathyrus seeds, and 4 P. sativum have been determined. The remaining sites in the study area yielded very few doubts (for more details, see Table 1 and ESM 5).

Table 1 Predictions made with either 2 or 3 models in agreement

The predictions obtained in the cases where either 3 or 2 predictive models were congruent, have helped to resolve most of the previous doubts by assigning the seeds to one of the taxa about which there was uncertainty. Morphometry allowed the clarification of the identification of the seeds of Baume Layrou, which probably all belong to the same species, P. sativum. At Lattara, the situation improves compared to the traditional identification, although it seems to be less evident than at the previous site. At La Monédière, it has also helped to clearly differentiate between V. ervilia and L. cicera/sativus, very similar species the identification of which at this site had been doubtful (Pinaud-Querrac’h 2021). This case is more or less similar to that of Plana del Castell, although with far fewer individuals and the uncertainty in this case being between P. sativum and V. ervilia (Guàrdia and Francés 2017). In the rest of the cases, which focus on specific doubts, very good results have also been obtained, although with some individual errors.

Discussion

The margin of error for each model varies between 4 and 21% at a posterior probability of < = 0.9. This has likewise been observed among other species (Ros et al. 2014; García-Granero et al. 2016; Bouby et al. 2018; Jesus et al. 2021). When we apply a model to a sample different from the training one, the margin of error increases by 25 to 30% with a posterior probability of < = 0.9. Therefore, it is essential to visually review the new identifications generated by these models to determine if the predictions are incorrect. This way, we can analyse how these models can help establish the new identifications.

Among the uncertain archaeological seeds, a series of errors obtained from the models were observed. Out of the 144 seeds from Baume Layrou, 123 were identified as P. sativum. The remaining 21 were identified differently, with V. sativa being particularly prominent. The visual re-examination of these seeds, especially when observing the size and shape of their hilum, allowed us to determine that they are actually P. sativum seeds (Fig. 8). The seeds identified as V. faba are deformed and somewhat fragmented. Lastly, the seed identified as Lathyrus is actually probably P. sativum.

Fig. 8
figure 8

Comparison of the hilum of a probable Pisum sativum from the site of Baume Layrou (left) with a contemporary carbonised Vicia sativa (right)

The doubtful seeds from SUs 54782 and 54766 of Latarra seem to be L. cicera/sativus, as the majority of seeds in these units were identified as this taxon. Although some were initially identified as peas, upon further examination, we have observed that they are likely Lathyrus seeds.

Upon re-examining the identifications of SU 53030, it has been observed that they appear to be correct, with both Lathyrus and P. sativum present. The latter species seems to be the only one in SU 27565, as the supposed V. sativa seeds are not actually V. sativa based on their shape and hilum size, while the rest of the identifications are of deformed and somewhat fragmented seeds.

Lastly, the identification of seeds in SU 50001 is more complex, as they are somewhat deformed and fragmented. Geometric morphometrics suggest that the majority are V. sativa, although P. sativum is present. Observations of the hilum of some seeds indicate that they are likely all V. sativa. These identification errors are due to their poor state of preservation.

Geometric morphometry applied to the finds from La Monédière has helped differentiate L. cicera/sativus from V. ervilia. A few individuals here identified as P. sativum are probably L. cicera/sativus. These errors are due to seed deformation and the margin of error inherent in the identification technique.

The predictions for the site of Plana del Castell, where the legumes are slightly deformed and fragmented, mainly point to V. sativa and a few cases of P. sativum. On reviewing the materials, these probably align with the first species.

The geometric morphometry applied to seeds at the remaining archaeological sites appears to corroborate our first revision. The technique thus serves to confirm these identifications, especially in the case of Kelin/Villares and Sant Martí d’Empúries (Table 1, for more details see ESM 5).

The findings of this study thus confirm the potential of geometric morphometry in identifying different species of leguminous seeds. Similar research has been carried out for the archaeological seeds of Papaver spp. (Jesus et al. 2021) and Panicum/Setaria (millets) (García-Granero et al. 2016). This technique appears to be useful in cases where there are specific doubts between several taxa and traditional identification criteria such as the testa or hilum are not preserved.

Moreover, the technique’s potential to differentiate varieties of the same species of certain fruit trees (Bouby et al. 2018; Bourgeon et al. 2018; Sallon et al. 2020; Bonhomme et al. 2021) and cereals (Ros et al. 2014; Bonhomme et al. 2017; Wallace et al. 2019) has already proven to distinguish intraspecific identifications. Differences in shape and size of L. culinaris have been observed within legume species in different samples from Font de la Canya (Avinyonet del Penedés, Barcelona), which could be attributed to potential varieties among other possibilities (Tarongi et al. 2021). Therefore, future analyses of this kind may provide new insights into archaeological legumes.

The technique will nonetheless never replace traditional seed identification. It can serve in the case of well-preserved uncertain archaeological seeds. It should also be noted that predictions are limited to the species introduced in the model. This suggests that certain species from the archaeological record that were not included in the predictive models could have been overlooked and incorrectly identified. This underscores the importance of introducing all possible taxa from a specific region and timeframe. Finally, all re-identified seeds should be reviewed by researchers to determine to which species they might belong so as to eliminate potential errors.

Therefore, the results discard certain identifications of L. cicera/sativus and V. ervilia based on Bronze Age contexts and confirm our previous ideas that concentrations of these species only appear from the Iron Age onwards. The similarities between L. cicera/sativus, V. ervilia, and P. sativum could have led to confusion in the identification of findings in Neolithic and Bronze Age sites.

For this reason, we believe that a thorough review of the scarce seeds of these species (V. ervilia, L. cicera/sativus) prior to the Iron Age using this technique could help determine if they were indeed cultivated before this period. The results would open the door to a new approach to the role of legumes throughout the Prehistory of the Western Mediterranean.

Conclusions

As prior research has confirmed the predictive effect of geometric morphometry, this study applied the technique to a number of uncertain taxonomic determinations of legumes from a group of Bronze and Iron Age sites in the Western Mediterranean. The study also confirmed the greater predictive power of shape variables obtained through geometric morphometry over those of size variables obtained traditionally. The results, despite inherent margins of error, are very positive both at the level of testing the models against each other and identifying doubtful archaeological seeds. This widens the potential of pursuing this line of research by expanding both the number of individuals and taxa in the predictive models and extending their application to other regions and chronological timeframes. Furthermore, their effectiveness also lays the path to approach more specific models to address current problems among the family of pulses, such as differentiating between Lathyrus cicera and sativus, or to search for intraspecific distinctions that allow the observation of differences in shape and size within the same species in specific regions and timeframes.