Introduction

Almond [Prunus dulcis (Mill.) D.A.Webb] is an economically important crop cultivated in more than 50 countries, with 95% of production in California, Australia, and the Mediterranean Basin. In California, almonds are the most produced tree crop and the second highest agricultural commodity with a value of $6.1 billion per year covering 1.2 million acres (CA Dept Food Ag 2019). The UC Davis almond breeding program has historically used phenotypic selection, which has several limitations, most prominently the amount of time required to observe and select for advantageous and accurate phenotypes, and has also developed several molecular markers for pedigree validation. More recently, identification, development and deployment of additional molecular markers has been initiated for several critical traits, enabling more robust and speedier selection. With advances in both genomic and phenotyping technologies, accuracy in selection can be made more expediently.

Molecular markers have been used for decades and are widely used in genetic based studies due to their capability to distinguish different genotypes at a locus. Genotyping-by-sequencing (GBS) is a next-generation technology with reduced representation of a genome where restriction enzymes are used, DNA fragments are ligated with barcoded adapters and subsequently sequenced (Elshire et al. 2011). The result is that large numbers of single nucleotide polymorphisms (SNPs) can be discovered for evaluating genetic diversity, QTL mapping, linkage mapping and association studies. Kompetitive allele-specific PCR (KASP) technology (LGC Limited) can be used for SNP genotyping based on dual fluorescent resonance energy transfer (FRET) cassettes. The KASP chemistry combines the use of a highly specific 5′-3′ exonuclease Taq DNA polymerase with two competitive, allele specific, tailed forward primers and one common reverse primer. It has been used for selection of traits and development of KASP assays to validate marker studies for evaluating root-knot nematode resistance in almond rootstock (Duval et al. 2018), fruit ring rot disease in apple (Shen et al. 2019) and shell hardness in almond (Goonetilleke et al. 2018).

Almond shells account for around 35–75% of the total fruit weight (Li et al. 2018).

The shell is made up of cellulose, hemicellulose and lignin. The content of cellulose in almond shells is 38%, while lignin content of almond is 44% (Li et al. 2018). Depending upon the industry, in a corresponding region there is a preference for either a soft or a hard shelled almond. For example, in the USA, the California industry prefers a softer shell, as evident by the dominant cultivar ‘Nonpareil’, which is considered a paper-thin shell, as rubber rollers are used for processing. In the Mediterranean regions such as Spain and Italy, hard shell varieties such as ‘Tuono’ and ‘Penta’ dominate the industry. These regions are usually dryland farmed and the hard shells provide protections from insect, birds, rodents and even rancidity (Verdu et al. 2017). In general, the harder the shell, the smaller the crack-out percent and relative nut size is, but this is not always the case; there can be hard, thinner shells with high crack-out percent, and even large kernels/crack-out percent in hard shells. A softer shell can have an increased susceptibility to insect damage such as by navel orange worm. In the U.S., insecticides are generally applied to the almond tree canopy twice (10–14 days apart) at the onset of hullsplit (1–5%), and once again a week prior to harvest to prevent insect damage to kernel.

There are nine descriptors for shell hardness, the ease of cracking by hand, or difficulty of cracking with a hammer, as defined by Gulcan (1985) (1- extremely hard, 3- hard, 5- intermediate, 7- soft and 9- paper). Shell hardness has been found to be highly correlated (r = 0.75 or greater) with shelling or crack-out percentage (Spiegel-Roy and Kochba 1981; Fornes Comas et al. 2019) less than 20% crack-out percent in hard shells to more than 60% in paper shells. For shelling, herein called crack-out percent, five classes have been established: very hard (< 30%), hard (30–40%), semi-soft (40–50%), soft (50–60%) and paper (> 60%) (Batlle et al. 2017; Fornes Comas et al. 2019).

Shell hardness has been measured by shell thickness, shell weight, shelling percentage and with a texture analyzer. A texture analyzer is an instrument that measures resistance to force of something and has been used to evaluate shell hardness in pistachio (Nikzadeh and Sedaghat 2008), walnut (Sideli et al. 2020) and almond (Fornes Comas et al. 2019).

The objective of this research was to perform genome-wide association for seven almond families, common cultivars and peach accessions; collect phenotypic measurements across two years, develop KASP molecular markers, and test and validate those markers on an unrelated set of trees. The results from this study aim to aid in marker-assisted breeding efforts for a higher crack-out percent.

Materials and methods

Plant material

A panel consisting of 264 almond trees (12 cultivars, 10 selections and 236 seedlings from crosses made in the UC Davis almond breeding program) and six peach trees (Online Resource 1) were used. The almond trees which were grown at Wolfskill Experimental Orchards in Winters, CA, ranged in age from 7 to 10 years. Trees were grown on their own rootstocks and watered with micro sprinklers. Nutrient management was the same for each rows of trees. There was one experimental tree per genotype. Nuts were collected from varying levels of the canopy and sides of the tree. Nuts were hand harvested after hullsplit (August – September) in 2019 and 2020, dried in an almond dryer and stored inshell in a 0° C ± 0.4 °C fridge with 82.8% ± 2.2% humidity until use. Average temperature and total rainfall for 2020 was higher than in 2019 (Online Resource 2) (University of California Cooperative Extension).

Additional materials from the USA, Australia, Spain, Italy and France (Online Resource 3) were used for evaluation of KASP markers.

Genotyping

Genotyping-by-sequencing (GBS) was performed with DNA samples from the GWAS panel using ApeKI as a restriction enzyme. A custom python script was used to run gbstrim (https://github.com/kdm9/gbstrim) in batches to trim reads to a maximum length of 93. Stacks 2 (Rochette et al. 2019) was used to clean data by truncating reads to obtain expected amplicon lengths. Bowtie2 v2.4.0 (Langmead et al. 2018) was used to index genome and align reads to peach genome V2.1 ‘Lovell’ available at Phytozome (URL: phytozome.jgi.doe.gov). SAMtools 1.12 (Li et al. 2009) was used to convert sam files to sorted bam files. Freebayes 1.3.6 (Garrison and Marth 2012) was used for SNP variant calling. After filtering for mapping quality (minimum 90%), allele balance, mean read depth (maximum 30), minor allele count (minimum 3), minor allele frequency (MAF) (minimum 0.05), missing SNPs (maximum (0.75 per genotype), read mapping quality score (minimum 20) and read depth differences between forward and reverse strands (maximum 100-fold), removing indels, selecting only bi-allelic SNPs and removing trees with more than 50% missing data, there were 12,406 SNPs across 264 trees. Imputation was performed in Tassel 5 (Bradbury et al. 2007) with the LinkImpute LD-KNNI algorithm based on k-nearest neighbor genotype imputation method (Money et al. 2015).

In preparation for principal component analysis (PCA), SNP data were pruned based on linkage disequilibrium (LD). The PCA was performed with R statistical package ‘SNPrelate’ v1.10.2 (Zheng et al. 2012).

Phenotypic measurements

From among the nuts harvested from each tree, 10 nuts were sampled at random. Each of these nuts was weighed to obtain its in-shell weight, then subjected to texture analysis to determine the force and time needed to rupture the shell. Shell thickness was measured, and the kernel was weighed. Shell weight was determined by difference (in-shell weight – kernel weight) and crack-out percentage was calculated ((kernel weight/in-shell weight) × 100%). In-shell and kernel weights were measured with an Ohaus Scout SPX balance (Parsippany, NJ) using a USB device interface SPDC V2.03 for automatic capture. Shell hardness was measured using texture analyzers (models TA-XT2 and TA-HDPlus; Texture Technologies; Surry, England) and Exponent 6.1.8.0 software (Stable Micro System; London, UK). The application of force (N) and time (seconds) to rupture the almond shell were automatically recorded with a custom macro to capture the maximum force and the integral (the area under the curve of time and force). The TA-XT2 texture analyzer was set as follows: pre-test speed (1.0 mm/s), test speed (2.00 mm/s), post-test speed (10.0 mm/s), trigger force (0.005 kg), return to start distance (4.0 mm for hard shells and strain 20% for softer shells) and calibrated with 2.0 kg weight. For almond shells that were too hard for the TA-XT2 texture analyzer, which has a maximum load cell capacity of 50 kg, a TA-HDPlus texture analyzer with a load cell of 750 kg was used. Shell thickness was measured using Mahr 16ER calipers and Marcom Professional 5.2 -6.2/13.06.2018 software for automatic capture.

The R package ‘emmeans’ was used to calculate the adjusted means of two years, with family and genotype as fixed effects and genotype and year set as an interaction. The model was:

$${Y}_{ijkl}=\mu +{f}_{j}+gk+{y}_{i}+\left(yg\right)\mathrm{ik}+{\varepsilon }_{\mathrm{ijkl}}$$
(1)

where Yijkl stands for the lth observation of the family j in the year i, μ is the constant overall mean, ygik is the year and genotype interaction and ε ijkl is the random error term with mean of 0 and a variance σ2ε. Maximum force and integral measurements were log transformed.

Variance component estimates and estimated breeding values

The ASReml-R 4 software tool was used to estimate variance components from the phenotypic data of 264 trees. A historical pedigree was used for the a- matrix. The linear mixed model was:

$$Y=X\tau +Zu+e$$
(2)

where Y denotes the (n × 1) vector of observations, τ is the (p × 1) vector of fixed treatment effects (year and family), X is the (n × p) design matrix, u is the (q × 1) vector of random effects (pedigree), Z is the (n × q) design matrix and e is the (n × 1) vector of residual error. The pedigree used was a historical pedigree from breeding records. A conditional Wald F-test was used to test the significance of fixed effects in the model. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to choose the best model. Narrow-sense heritability (h2) was estimated using data from 264 trees across 7 families by dividing the estimated additive genetic variance by the total phenotypic variance. Estimated breeding values were calculated in ASReml-R 4 using the predict function on the linear mixed model.

Genome-wide association analysis

Genome-wide association (GWAS) analysis was applied to estimated breeding values using a multi-locus mixed linear model (MLMM) algorithm in the R package GAPIT 3 (Wang and Zhang 2020). The MLMM is a multi-linear model (MLM) where both Q (population structure) + K (kinship matrix) are fitted to the model as random effects, reducing type I errors due to spurious associations from relatedness and population structure. Q-Q plots and Manhattan plots were inspected for evidence of inflation. The maximum number of PCs to include as covariates in the multivariate model was determined by examining the initial PCA and scree-plot, then the number of PCs to include was determined using the function model selection implemented in GAPIT. SNPs with MAF below 0.05 were excluded. SNP data file used as an input for GWAS is available in (Online Resource 4). PLINK (Purcell et al. 2007) was used to calculate the linkage disequilibrium between significant SNPs on the same chromosome. ‘LD heatmap’ (Shin et al. 2006) package in R was used to visualize regions of LD around significant SNPs.

The gene annotation Prunus persica V2.1 was used to assess annotated genes in the regions around SNPs within 100 kb on each side of SNP, as these regions had R2 > 0.6 and putative genes were surmised.

KASP assay design and genotyping

For each SNP that was significantly associated with log maximum force, shell thickness and/or crack-out percent, SAMtools (Li et al. 2009) function faidx was used to retrieve a 100-bp sequence containing the SNP. That sequence was analyzed using Kraken software (LGC Limited) to design primers for a KASP assay. Each primer set was named according to the chromosome and position of the SNP for which it was designed.

The newly designed KASP assays were applied to DNA samples from 332 trees (216 from the USA, 14 from Australia, 87 from Spain, 8 from France, 4 from Italy and 3 of unknown origin). In addition, two KASP assays designed by Goonetilleke et al. (2018) were applied to DNA samples from 216 UC Davis varieties/selections that had phenotypic data from this study.

The KASP genotyping was conducted on SNPLine (LGC Limited) platforms according to the manufacturer’s instructions. Results were analyzed using Kraken (LGC Limited) software.

Results

Genotyping-by-sequencing and principal component analysis

In sequences generated by GBS, only 30% could be aligned to ‘Texas’ almond genome sequence assembly, but 83% of them were aligned to version 2.1 of the peach ‘Lovell’ sequence assembly. Genotyping-by-sequencing of a panel of 280 trees generated a total of 2,162,690 total variants and after filtering, there were 12,406 SNPs across 264 trees. Mean sequence depth was 10. Imputation accuracy was found to be 89.97%. With LD pruning, a subset of 4,137 SNPs was selected for PCA. Application of PCA to this subset clearly separated all almond trees from the six peach trees (Fig. 1). Among the almond progeny, those derived from ‘Ferragnès’ crosses, ‘Nonpareil’ \(\times\) 02,1–271, ‘Tardy Nonpareil’ \(\times\) 95,1–26 and ‘Winters’ \(\times\) 00,2–3 mostly formed distinct clusters, except that some progeny of ‘Winters’ \(\times\) 00,2–3 overlapped with progeny of ‘Nonpareil’ \(\times\) 02,1–271 and progeny of ‘Tardy Nonpareil’ \(\times\) 95,1–26 clusters and a few progeny of ‘Ferragnès’ \(\times\) 99,9–86 were between the ‘Ferragnès’ and ‘Nonpareil’ clusters.

Fig. 1
figure 1

Principal component analysis for 264 trees and 4137 markers

Shell traits

The distributions of residuals for maximum force and for integral deviated significantly (p < 0.001) from normality (Shapiro Wilk (W) = 0.894 and 0.906, respectively). Log transformation of these two variables reduced skewness, but did not completely eliminate the deviation from normality (W = 0.981, 0.972, p < 0.001 for both). The range, mean and median for shell thickness, log maximum force, log integral, kernel weight, crack-out percent and shell weight are listed in (Online Resource 5).

Estimates of narrow-sense heritability (h2) were high for shell weight (0.88 ± 0.01), shell thickness (0.88 ± 0.02), maximum force (0.77 ± 0.03), integral (0.69 ± 0.04) and crack-out percent (0.81 ± 0.65). Estimated breeding values (EBVs) for shell thickness, log maximum force, log integral and crack-out percent were distributed normally, while EBVs for shell weight were strongly skewed towards low values (Fig. 2). Estimated breeding values for log maximum force were distributed normally. Estimated breeding values for maximum force and log interval were positively correlated with each other (r = 0.95) and with shell thickness (r = 0.66 and 0.71, respectively) and shell weight (r = 0.59 and 0.60, respectively).

Fig. 2
figure 2

Distribution and correlations of estimated breeding values for shell quality traits. Graphical representation to display the trait distribution on the diagonal and correlation plots to the left of diagonal for a log maximum force, b log integral, c shell thickness, d crack-out percent and e shell weight

Families with ‘Ferragnès’ as one parent had the largest breeding values for maximum force (up to 1,249.00 N) (Online Resource 6). Families with ‘Nonpareil’ as one of the parents had lower breeding values for maximum force (77.58 N) and high breeding value for crack-out percent.

Genome-wide association analysis

Significant marker-trait associations were detected on chromosomes 2, 5, 7 and 8 (Table 1 and Fig. 3), involving a total of seven SNPs. Each of these SNPs was within a predicted gene on the peach genome. For shell thickness, the associated SNP on chromosome 5, SPP05_11616743, was found to be within a predicted gene region at position 11,691,415–11,694,929 bp for galactosyltransferase family protein (Table S5) (Online Resource 7).

Table 1 Markers associated with shell traits. maf minor allele frequency, R2 variance explained for SNP, underlined nucleotide signifies which had a positive effect
Fig. 3
figure 3

Genome-wide Manhattan plots for almond shell traits. Multi-linear mixed model (MLMM) was used on the estimated breeding values (EBVs) for a log maximum force, b shell thickness, c shell weight, d crack-out percent

Marker SPP02_23368051 (an A:G SNP at position 23,368,051 on chromosome 2 of the peach genome assembly) was associated with four traits: maximum force, shell thickness, shell weight and crack-out percentage. Almonds from trees with the A:G genotype at this position SPP02_23368051 required greater maximum force and had thicker shells, higher shell weight and lower crack-out percent (Fig. 4a–d) than those from G:G trees. Both genotypes were observed within the four families of ‘Ferragnès’ progeny and within the ‘Nonpareil’ \(\times\) 02,1–271 family, but only the G:G genotype was observed within the ‘Tardy Nonpareil’ \(\times\) 95,1–26 and ‘Winters’ \(\times\) 00,2–3 families. Based on examination of genotypic and phenotypic data, it was clear that the significant associations observed for this SNP were largely due to segregation and marker-trait associations within the ‘Ferragnès’ and ‘Nonpareil’ families (Fig. 5 a, b).

Fig. 4
figure 4

Genotype plots for most significant marker-trait associations. SNPs displayed with differences in phenotype across genotypes. a SPP07_14887902 for log maximum force, b SPP02_23368051 for log maximum force, c SPP05_11616743 for shell thickness, d SPP02_2336051 for shell thickness

Fig. 5
figure 5

Genotype by phenotype plots for 4 traits, 2 families and all crosses. Parents are labeled on plots. a1-4 Ferragnès crosses for log maximum force, shell thickness, shell weight and crack-out percent. b1-4 Nonpareil × 02,1–271 for log maximum force, shell thickness, shell weight and crack-out percent. d1-4. c1-4 All crosses for log maximum force, shell thickness, shell weight and crack-out percent

Two other SNPs, both on chromosome 7, were associated with a requirement for higher maximum force (Table 1 and Fig. 3). Almonds from trees with the T:T genotype at SPP07_14887902 required greater maximum force than those from C:T trees (Fig. 4f). Almonds from trees with the G:G genotype at SNP SPP07_16579940 required greater maximum force than those from A:G trees. These two SNPs were in moderate LD (r2 = 0.393).

On chromosome 5, two A:G SNPs (SPP05_8708350 and SPP05_11616743) were significantly associated with shell thickness. At each of these SNPs, almonds from A:A trees had thicker shells than those from A:G trees (Fig. 4e). These two SNPs were in moderate LD (r2 = 0.564).

Two other SNPs were significantly associated with shell weight: one on chromosome 5 and one on chromosome 8. Almonds from trees with the A:G genotype at SPP05_3719741 had higher shell weight (Fig. 4h) than those from A:A trees. Almonds from trees with the T:C genotype at SPP08_22165513 had higher shell weight than those from C:C trees (Fig. 4g). For shell weight, there were additional significant SNPs, one on chromosome 2 and one on chromosome 8 (Fig. 5C), however they were not in LD (r2 = 0.009, 0.15 respectively) with the most significant SNPs (SPP02_23368051, SPP08_22165513).

Application of KASP assays

All five of the KASP assays designed for SNPs associated with log maximum force, shell thickness and/or crack-out percent (Online Resource 8) successfully discriminated among genotypes within the panel on which they were tested (Online Resources 9 and 10). For members of that panel that had also been analysed by GBS, the KASP genotype calls agreed with those from GBS.

Of the five SNPs for which KASP assays were developed and applied, only SP02_23368051 exhibited the expected association with shell hardness (Table 2, Online Resource 11).

Table 2 Classification of almond trees according to their SNP genotypes obtained with KASP markers and shell hardness classification

Among 205 trees that were genotyped for WriPdK50 (Goonetilleke et al. 2018) and that could be classified as having soft shells (greater than 40% crack-out) or hard shells (less than 40% crack-out) the proportion of soft-shell types was much higher (0.85) among trees genotyped as C:C than among trees genotyped as C:A (0.07) (Online Resource 12). Among 207 trees that were genotyped for WriPdK50 and that could be classified as having soft or hard shells, the proportion of soft-shell types was higher (0.81) among trees genotyped as G:G than among trees genotyped as T:G (0.56).

Discussion

Almond varieties and germplasm can exhibit considerable variation in allocation of biomass between kernels and shells. Differences in this property are often assessed as crack-out percent (kernel weight as a percentage of nut weight), which in turn is used to classify nuts into hard (lowest crack-out percent), semi-hard, semi-soft, soft and paper-shell types (highest crack-out percent), with different types preferred in different markets. The actual hardness of almond shells can be more directly measured with texture analyzers, which provide estimates of the force required to rupture the shells, while shell thickness can also be measured. Given sufficient variation, it is expected that the force required to rupture shells will be positively associated with shell thickness and negatively associated with crack-out percentage. Consistent with this, and with some previously reported results (Ledbetter (2008) and Sánchez-Pérez et al. (2007)), crack-out percent was negatively correlated with both shell thickness and the force required to rupture almond shells. In contrast, Fornes Comas et al. (2019) did not find a strong correlation between shelling load and kernel percentage, probably because of limited variation within their panel of hard-shelled cultivars.

Using genotypic data obtained by GBS, we conducted GWAS for maximum force, shell weight, shell thickness and crack-out percent, all of which had high heritability. Consistent with strong correlations among traits, one SNP on chromosome 2 (SPP_023368051) was significantly associated with all four traits. QTLs for related traits have previously been reported on chromosome 2 in the biparental crosses ‘Ferragnès’ × ‘Tuono’ (Arús et al. 1999), ‘R1000’ × ‘Desmayo Largueta’ (Sánchez-Pérez et al. 2007) ‘Nonpareil’ × ‘Lauranne’ (Goonetilleke et al. 2018).

Six other SNPs (three on chromosome 5, two on chromosome 7 and one on chromosome 8) were also significantly associated with individual traits. Although marker-trait associations had previously been reported on chromosome 5 (Goonetilleke et al. (2018)) and chromosome 8 (Arús et al. (1999); Goonetilleke et al. (2018), the SNPs identified here were over 3 Mbp (chromosome 5) and were over 9.4 Mbp (chromosome 8) from the previously reported loci.

Of six shell-hardness associated SNPs for which that Goonetilleke et al. (2018) designed KASP assays, two are in the same region of chromosome 2 as a SNP (SPP02_23368051) for which trait associations were detected here. The KASP assays WriPdk50 (which assays a SNP at 108 kb from SPP02_2336805) and WriPdK264 (which assays a SNP at 562 kb from SPP02_23368051) respectively, both revealed polymorphism within our GWAS panel, with homozygous genotypes associated with softer shells, consistent with what was reported by Goonetilleke et al. (2018). However, they were not in LD with SPP02_23368051 (Online Resource 12).

A KASP assay developed for SPP02_23358051 which gave results that were highly predictive of shell hardness across a diverse panel of materials, indicating that this marker could be broadly useful in almond breeding. Among KASP assays developed for four other SNPs two (for SPP05_11616743 and SPP05_8708350) were moderately predictive of shell hardness, and two (for SPP07_14887902 and SPP07_16579940) were not predictive of shell hardness.

In this study utilizing a GWAS panel we confirmed major marker-trait associations on chromosome 2, 5 and 8, and found new associations on chromosome 7. The detection of associations on chromosome 7 may be due to the inclusion of many soft-shell selections in our panel, in contrast to the predominance of hard-shell materials in most previous studies. The SPP_023358051 marker had a clear segregation in the ‘Ferragnès’ and ‘Nonpareil’ families while the other families, ‘Tardy Nonpareil’\(\times\) 95,1–26 and ‘Winters’\(\times\) 00,2–3 did not segregate for the shell traits because the parents were too similar in their genotype and there was not enough phenotypic variation to determine a significant difference. The families surveyed carry only the heterozygous or one type of homozygous genotypes, or only one homozygous genotype.

One of the trait-associated SNPs identified in this study (SPP05_11616743, which is associated with shell thickness) is located 74,672 bp from a predicted gene (Prupe.5G113400) that may be involved in cellulose production. This is of interest given that cellulose is a major component contributing to the thickness of the almond shell (Li et al. 2018). Prupe.5G113400/ XM_007220948.2 has 74% sequence similarity to the Arabidopsis thaliana gene AT5G53340.1, has been predated to encode galactosyltranferase protein. In A. thaliana, galactosyltransferases GALT7 and GALT8 were found to participate in cellulose biosynthesis, with mutants showing primary cell wall defects with impaired growth and cell expansion in seedlings, and secondary cell wall defects with collapsed xylem wall thickness (Nibbering et al. 2022).

KASP assays are particularly useful for applications, such as marker-assisted selection, that require genotyping of large populations for relatively few markers. Here, we developed KASP assays from SNP markers that were discovered with GBS and found to be associated with traits of interest. By applying the KASP assays to a diverse panel of materials we found that that SPP_023358051 is highly predictive of shell hardness, indicating that this marker could be broadly useful in almond breeding. This is based on its strong associations with phenotypic traits measured here, and on its validation using a KASP assay applied to a different panel and materials. Other KASP assays that were developed and/or applied here were effective in discriminating among SNP genotypes, but did not exhibit any associations with traits.

Among the results reported here, the association of seven SNPs with kernel and nut traits provides a basis for further exploration of the genetic control of these traits in almond, and a KASP marker developed for SPP_023358051 may be immediately useful in almond breeding.