Skip to content
Publicly Available Published by De Gruyter November 1, 2023

Plackett–Luce modeling with trajectory models for measuring athlete strength

  • Katy McKeough ORCID logo EMAIL logo and Mark Glickman

Abstract

It is often the goal of sports analysts, coaches, and fans to predict athlete performance over time. Models such as Bradley–Terry and Plackett–Luce measure athlete skill based on results of competitions over time, but have limited predictive strength without making assumptions about the nature of the evolution of athletic skill. Growth curves are often applied in the context of sports to predict future ability, but these curves are too simple to account for complex career trajectories. We propose a non-linear, mixed-effects trajectory to model the ratings as a function of time and other athlete-specific covariates. The mixture of trajectories allows for flexibility in the estimated shape of career trajectories between athletes as well as between sports. We use the fitted trajectories to make predictions of an athlete’s career trajectory through a model of how athlete performance progresses over time in a multi-competitor scenario as an extension to the Plackett–Luce model. We show how this model is useful for predicting the outcome of women’s luge races, as well as show how we can use the model to compare athletes to one another by clustering career trajectories.

1 Introduction

The ability to predict an athlete’s performance is highly sought after by sports analysts, coaches, and fans. For example, inferring when an athlete’s ability may peak could allow coaches to make more intelligent recruiting and resourcing decisions. Most competitive sports can be divided into head-to-head and multi-competitor games. Head-to-head sports include games like basketball, tennis, and soccer, and athletes’ abilities can be measured with paired-comparison models. Many multi-competitor sports are races, where players compete for the fastest time, or point-based systems where judges award players points based on performance. Sports that fall under this definition include marathons, Formula 1 racing, and diving. Modeling performance of athletes in these multi-competitor sports is less studied than in their head-to-head counterparts. Head-to-head sports can be conceptualized as a specific case of multi-competitor sports in which only two athletes compete at a time. As a result, multi-competitor analytic methods can be adapted to head-to-head sports.

One approach to measuring strength over time in multi-competitor and head-to-head sports is to use a growth-curve model. A growth curve is a parametric representation of how a numerical quantity changes over time. Growth-curve-based (parametric) methods of prediction can be advantageous when compared to methods that do not assume pre-specified time-relationships because they can more easily capitalize on built-in assumptions that are made when modeling player strength over time. Conversely, unless cautiously addressed, nonparametric methods can result in high prediction errors when extrapolating out of sample, limiting their utility to description rather than prediction. For head-to-head sports where the outcome of interest is a binary indicator of winning versus losing, the most commonly used model is arguably the Bradley–Terry model (Bradley and Terry 1952). The Plackett–Luce model, also known as the rank-order logit model, is a commonly used approach to modeling multi-competitor games in which only rank orderings are recorded. This model was created as an extension to the multinomial logit choice model Luce (1959) for rank orderings (Plackett 1975). The Plackett–Luce model is the primary focus of this paper. It is worth mentioning that when rank orderings involve only two competitors at a time, the Plackett–Luce model becomes the Bradley–Terry model as a special case. The Plackett–Luce model without modification provides no way of accounting for changes in athlete strength over time.

Previous work has explored modeling time-varying competitor strength. Several nonparametric approaches capture estimates of player performance over time by using the Plackett–Luce model, including fitting splines (Baker and McHale 2013) and using a gamma process (Caron and Teh 2012) to model strength as a function of time. Bayesian approaches using stochastic updates include TrueSkill which models strength over time using normally distributed updates to the ability parameter, and Glickman and Hennessy (2015), henceforth denoted as GH, which propagates performance over time via a Gaussian random walk. None of these models accounts for player-specific covariates, including how performance might change due to an athlete’s age or other time-varying traits. A separate approach for estimating athlete skill are dynamic models such as Markov transition models. Markov transition models have been used to predict the number of home runs a Major League Baseball player will hit based on their past performance (Glynn and Tokdar 2017; Jensen, McShane, and Wyner 2009). Although these models have demonstrated predictive power, they are only applied to a measured performance (home runs) and are not used to estimate skill from rank-ordered results. A modification to the Plackett–Luce model that accounts for systematic changes in athlete strength over time while providing a means for predicting future performance is missing from the literature. We are not only interested in capturing an athlete’s current ability, but also predicting how that might change in the future. Our contribution is to use a parametric approach by modeling athlete strength over time using a growth curve, which we will call “trajectory models” for the remainder of the manuscript.

We present a novel model for player strengths in multi-competitor sports that incorporates a flexible growth-curve model to estimate the evolution of athlete performance over time. In Section 2, we introduce the Plackett–Luce model and discuss the construction of a trajectory model to estimate performance over time. We summarize the model fitting and selection techniques in Section 3. In Section 4, we discuss the application of our methods to the results of professional women’s luge events, including an evaluation of the predictions and an exploratory extension using clustering. The paper concludes in Section 5 by describing limitations and extensions to this work.

2 Model specification

During T discrete time periods, we observe n athletes that participate in observed competitions. Each athlete i = 1, …, n has an ability parameter θ it that indicates the athlete strength at a given time period t = 1, …, T. Within each time period, K t competitions take place where, in competition k = 1, …, K t , m kt competitors participate.

2.1 Plackett–Luce modeling

For competition k within time period t, assume athletes 1, 2, …, m kt compete. Assuming no ties, let R i , i = 1, …, m kt be the rank of athlete i from the results of the competition. That is, R i = j is equivalent to athlete i coming in jth place. Suppose { i 1 , i 2 , , i m k t } is a permutation of {1, 2, …, m kt }. Then the Plackett–Luce model (Luce 1959; Plackett 1975) asserts that

(1) Pr ( R 1 = i 1 , R 2 = i 2 , , R m k t = i m k t | θ t ) = j = 1 m k t 1 exp ( θ i j t ) = j m k t exp ( θ i t ) ,

where θ t = ( θ 1 t , , θ m k t t ) , and θ it for i = 1, …, m kt is the strength parameter for athlete i at time period t. The model in (1) is essentially a telescoping product of multinomial logit probabilities in which the first term is the probability player i1 outperforms everyone else in the competition; the second term is the probability that player i2 outperforms everyone else in the competition excluding player i1; the third term is the probability that player i3 outperforms everyone else in the competition excluding players i1 and i2, and so on.

The Plackett–Luce model can be derived through a latent variable framework. This representation will be helpful in Section 4.1 when evaluating our fitted model. Suppose the performance of athlete i at time t with strength parameter θ it follows a specified probability distribution. In particular, let Y itk be the unobserved latent performance by competitor i at time t during competition k. If we assume a cumulative distribution function

(2) Pr ( Y itk < y | θ i t ) = exp ( exp ( ( y θ i t ) ) ) , < y <

that is, the Y itk follows a Gumbel distribution, then the probability athlete i1 outperforms i2 who outperforms i3, and so on, can be shown to be

(3) Pr ( Y i 1 t k > Y i 2 t k > > Y i m k t t k | θ t ) = P r ( R 1 = i 1 , , R m k t = i m k t | θ t ) = j = 1 m k t 1 exp ( θ i j t ) = j m k t exp ( θ i t ) .

This result is due to Plackett (1975), and can be easily shown using basic probability calculus. The likelihood contribution for multiple competitions is simply the product of (1) over the competitions.

2.2 Trajectory model

A growth curve, to which we refer in this manuscript as a trajectory model, is any parametric function that describes how a quantity changes over time. Shapes can range widely and may take the form of linear, exponential, logistic, or S-shaped models. Trajectory models can also include more complicated relationships like mixed-effects or nonlinear regression (Panik 2014; Pinheiro and Bates 2000). Growth-curve models are well-proven and have been utilized for prediction in many fields, including sports-analytics literature. However, the trajectory models commonly used to predict athlete strengths are too simple to account for complicated changes in performance over time, such as flexible career trajectories impacted by injuries. Models typically use a quadratic function to find the time of the athletes’ optimal performance, such as Brander, Egan, and Yeung (2014) with National Hockey League data, Malcata, Hopkins, and Pearson (2014) with triathlete data. Examples of more general trajectory models include Moudud et al. (2008) who use the nonlinear logistic trajectory model with a mixed-effects trajectory model to predict the speed of youth cross country skiers, and Bradlow and Fader (2001) who use the generalized gamma curve to model the ranking of songs on Billboard charts. Both of these trajectory models use an exponential decay term with the expectation that, over time, performance converges to a specific value or decays altogether, a characteristic we expect to be shared with an athlete’s career trajectory.

The main alternative to parametric trajectory models is to assume a stochastic process on the θ it . This is the approach taken by GH. Specifically, the GH model assumes a Gaussian random walk on the θ t of the form

(4) θ t + 1 | θ t , τ 2 N θ t , τ 2 I

where τ2 is the innovation variance of the Gaussian random walk. Other assumptions on the stochastic process in (4) may be made, such as the inclusion of an autoregression factor, but as shown in Glickman and Hennessy (2015) the basic random walk works well in practice. A challenge using the GH approach is in the difficulty of forecasting beyond several time periods. The model assumes that the expected ability of an athlete at time t + 1 is centered at their ability at time t.

Instead of including a stochastic process component, we can extend the Plackett–Luce model by including a trajectory model component. Trajectory models are useful for modeling career trajectories because they can be constructed in a flexible manner. While a quadratic function describes the typical athlete trajectory, increasing performance until reaching their peak and then declining in performance, it does not allow for more complex trajectories such as careers with several peaks due to injuries or adoptions of new strategies. To encourage flexibility in the model, the trajectory model in Equation (5) allows for a polynomial of varying degrees. We assume a novel parametric growth-curve model on θ it as a function of the current period t and the period in which the athlete first competed t0i. The model we assume on θ it is

(5) θ i t = α i + β i 0 + β i 1 t 1 * + β i 2 t 2 * + β i 3 t 3 * + + β ip t p * e ω ( t t 0 i ) ,

where the bracketed term is a pth order polynomial. Intercepts and coefficients α i and β i = {βi0, βi1, βi2, …, β ip } vary per individual, and the rate, ω > 0 is fixed across individuals. The β i and α i are competitor-specific random effects. The intercepts α i not only adjust for different starting strength, but they also represent the limit of decay over a long career. The coefficients βi0, βi1, …, β ip and intercept α i parameters differ by athlete, since we do not expect each athlete to have the same career trajectory.

The stabilization factor of ω shows the performance decaying over time. This concept is intuitive because of the physical nature of sports, an athlete’s performance decays with age. The parameter ω remains fixed across athletes because we believe the overall stabilization factor of an athlete’s performance should be consistent between athletes in the same sport, on average. The polynomial component provides enough flexibility to account for variation in stabilization factors on an individual level.

The order of the polynomial is a free parameter p that needs to be chosen via model selection as described in Section 3. To reduce correlation among polynomial orders of time, we use orthogonal polynomials of tt0i = 0, …, Tt0i, denoted as t 1 * , t 2 * , , t p * . Orthogonal polynomials are calculated across the entire data set by creating an orthogonal basis from a QR decomposition via Gram–Schmidt orthogonalization. The orthogonal polynomials are calculated through time T, and for forecasts we can continue the Gram–Schmidt orthogonalization on times T + 1, T + 2, …. See Appendix A for details on the creation of the orthogonal polynomials in fitting and prediction. Raw time is used in the exponent since orthogonal polynomials are not guaranteed to be non-negative. To ensure identifiability of the strength parameters, we center the θ it at every time t by requiring i = 1 n θ i t = 0 .

The intercept and coefficient parameters are modeled as random effects, allowing for shrinkage toward a population mean for the parameters of athletes with few observations. Malcata, Hopkins, and Pearson (2014) chose to model performances of triathletes as a function of age with coefficients for random effects so that each athlete had a different set of parameters. Bell et al. (2016) used multilevel modeling on Formula 1 race results to account for team and driver effects, and identified the best racers over time. For our model, the polynomial parameters are assumed to come from a common normal distribution. The α i are normally distributed with a mean of 0 and a variance of σ α 2 . The β ib coefficients are drawn from independent normal distributions with different means η β b and variances σ β b 2 corresponding to the different polynomial orders b = 0, 1, …, p.

We choose to keep priors non-informative or weakly informative. For the intercept and coefficient parameters, the support of the prior must be a continuous distribution that spans all real numbers, e.g., a normal distribution. The prior distribution for ω should span all non-negative or positive numbers, and could take the form of an exponential or gamma distribution.

2.3 Adding covariates

In some cases, (e.g., the case study in Section 4) we may be able to collect more player-specific information. We illustrate two examples of how such covariates may be incorporated into the growth-curve model described in Equation (5).

If available, variables such as weight, height, and equipment specifications that change over time could be useful in predicting performance. For example, youth athletes who grow taller more quickly may increase in performance more quickly than expected compared to their slower-growing counterparts. In this example, time-varying, player-specific covariates X it can be included in the trajectory model as an additive term,

(6) θ i t = α i + β i 0 + β i 1 t 1 * + β i 2 t 2 * + β i 3 t 3 * + + β ip t p * e ω ( t t 0 i ) + γ X i t ,

where γ is the vector of coefficients on the covariates describing linear changes over time.

Another variable we might include is player age. The older an athlete, the less time they have in their career and the more quickly their performance stabilizes. However, the impact varies from sport to sport. For example, in target shooting, skill is usually a far superior predictor of performance than age, which is why these sports boast the oldest Olympic athletes with the longest careers. Conversely, gymnastics is a sport that requires an enormous amount of flexibility and dexterity that tends to be only attainable by young athletes. We assume that the coefficient for age is fixed within a sport. Our model incorporates the age of an athlete’s first professional appearance, z i , as

(7) θ i t = α i + β i 0 + β i 1 t 1 * + β i 2 t 2 * + β i 3 t 3 * + + β ip t p * e ( ω 0 + ω 1 z i ) ( t t 0 i ) ,

where ω1 ≥ 0, ω2 ≥ 0 are the linear coefficients describing the stabilization factor. This form of trajectory model is used to fit women’s luge data in Section 4. Another option may be to make our trajectory model a function of age rather than a function of time. Although directly incorporating age into the trajectory model would be straightforward, we choose to index the trajectory as a function of time since the start of the athlete’s professional career and include age as a covariate. Indexing by length of career instead of age makes it easier to compare career trajectories across athletes of different ages with more overlap.

3 Model fitting & selection

We use Markov Chain Monte Carlo (MCMC) sampling for model-fitting. The rank ordered logit likelihood is a special case of a multinomial logit model insofar as it is a telescoping product of multinomial logit probabilities. The extension of including a parametric function for the strength parameter is similar to incorporating a link function in multinomial logistic regression (Chib, Greenberg, and Chen 1998). Bansal et al. (2020) shows how to use MCMC approaches to fitting mixed multinomial logit as described by McFadden and Train (2000). Sampling from the conditional distribution of the parameters in (7) can be accomplished using random walk Metropolis sampling (Haario, Saksman, and Tamminen 1999). Recent work has shown the benefits of using MCMC techniques when fitting non-linear state space models (e.g., Aicher et al. 2019). For our application in Section 4, the approach we use to obtain posterior draws is No U-Turn Sampling (NUTS; Hoffman and Gelman 2014). NUTS is an extension to the Hamiltonian Monte Carlo (HMC) sampler which is easily implemented using the RStan software (Neal 2012).

We select the order of the polynomial via cross-validation (CV). Specifically, we employ leave-one-out cross-validation (LOO-CV) which has historically been applied in the Bayesian context for model selection (e.g., Alqallaf and Gustafson 2001). The LOO-CV method developed by Vehtari, Gelman, and Gabry (2017) maintains a Bayesian framework by using the full posterior to select the best model for predictive accuracy. Bayesian LOO-CV has shown in various numerical experiments to outperform other approaches such as likelihood ratio tests (LRT), novel information-based criteria such as the Watanabe–Akaike Information Criterion (WAIC), and more common information-based approaches (AIC, BIC, and DIC) in some models. It can perform better when optimizing out-of-sample accuracy in a Bayesian setting since it utilizes the full posterior (Luo et al. 2017; Piironen and Vehtari 2017). Bayesian LOO-CV is a technique that proved to perform well in recovering the correct degree polynomial.

Starting at a linear model (p = 1) we use LOO-CV to calculate the difference in expected log pointwise predictive density (ΔELPD) between p and p + 1 and the corresponding standard error. We choose to accept the higher-order polynomial if the ELPD for p + 1 is greater than 2 standard errors away from that of the ELPD for p. Model selection is made separately per sport because we do not expect every sport to have the same polynomial order in the trajectory model. This forward stepwise procedure ensures that we have the simplest model while still capturing the desired flexibility of athlete career trajectories. The forward stepwise procedure remains the same no matter which CV procedure is used.

4 Application to women’s luge

We apply our modeling framework to women’s luge athletes. Women’s luge has been a participating sport in the Winter Olympics since 1964. Luge can be raced as a duo, but for the sake of simplicity, we only focus on individual athlete events. During a single women’s luge event, each athlete gets a pre-determined number of chances to sled down the same designated track. The women are timed from the moment they cross the starting line at the top of the track to when they cross the finish line at the bottom. The final ranking is determined from the cumulative time of all runs. Since the final result is a rank ordering of all participating athletes, luge can be viewed as a multi-competitor sport.

The data set we used was provided by the US Olympic and Paralympic Committee and consists of results from 142 events from the 2004 to the 2017 winter seasons of 166 women. All recorded events are professional-level, single sled women’s luge events. Along with complete ranking results from each event, we have participating athletes’ birthdates.

We separate the events by dividing them into periods. The dates are divided into four evenly spaced periods per year starting on January 1, April 1, July 1, and October 1. The data set spans 50 periods total starting with October 1, 2004 and ending with March 31, 2017. It also features roughly 5–6 events per period in the winter periods (October–March) and no events per period in the off-seasons (April–September). We choose 3-month periods as they are long enough to observe an athlete perform typically at least once within the competition but short enough so that athletes’ abilities likely do not change appreciably within the period. This selection enables us to see details of changes in ability across a typical athlete’s career. We calculate this orthogonal bases t 1 * , t p * using the poly function in R (Chambers and Hastie 1992).

We include age as a covariate following Equation (7). As mentioned in Section 2, each polynomial parameter varies per athlete, but is drawn from the same distribution as the parameters of the same degree. The priors for the stabilization parameters ω0, ω1 are assumed exponential distributions.

α i | σ α 2 Normal 0 , σ α 2 β b i | η β b σ β b 2 Normal η β b , σ β b 2 ω 0 , ω 1 Exp ( 10 )

The priors for the hyperparameters are chosen to be vague. We assume the following vague but proper distributions.

σ α Uniform ( 0,1 0 4 ) η β b Normal ( 0,100 ) σ β b Uniform ( 0,1 0 3 ) .

The model is fit using NUTS in RStan. We fit the model with 15,000 iterations with a burn-in of 13,000, across three chains, for a total of 6000 draws used for inference. To ensure convergence of the NUTS, we fit the model using the three independent chains to calculate the split R ̂ , a modification to the traditional R ̂ suggested by Gelman and Rubin (1992) that compares variation between beginning and end of chain to ensure stationary of the convergence (Gelman et al. 2013). As recommended by Gelman et al. (2013), if any parameter is larger than 1.1 then we should not accept the posterior samples as valid. Figure 1 shows the split- R ̂ values for the model fit to women’s luge data. Although some values are moderately high (>1.05), all fall under 1.1.

Figure 1: 
The 





R

̂



$\hat{R}$


 for posterior draws of all parameters while fitting the model described in Section 4.
Figure 1:

The R ̂ for posterior draws of all parameters while fitting the model described in Section 4.

Model selection was conducted via LOO-CV. The results of the model selection are shown in Table 1. Model 2 shows the degree of the higher degree polynomial compared to Model 1, the lower degree polynomial. The final model was determined to have p = 3:

θ i t = α i + β i 0 + β i 1 t 1 * + β i 2 t 2 * + β i 3 t 3 * e ( ω 0 + ω 1 z i ) t .
Table 1:

Results of LOO-CV on Women’s luge data.

Model 1 Model 2 ΔELPD SE
p = 1 p = 2 18.8 7.1
p = 2 p = 3 29.6 8.2
p = 3 p = 4 −2.8 6.0

4.1 Evaluation

To evaluate the predictive ability of our model, we use a weighted average of the Spearman rank correlation (Spearman 1904) between our projected latent performance and the true race results. This method is also used to evaluate the GH model in Glickman and Hennessy (2015). The reason for using Spearman rank correlation instead of the posterior predictive is to have an evaluation metric that is robust against the large variation in the posterior predictive due to a large number of competitors and a small number of competitions.

The weighted average of the Spearman rank correlation at predicted time T is:

ρ W = k = 1 K T ( m k T 1 ) ρ ̂ k k = 1 K T ( m k T 1 ) ,

where K T is the number of events at time T and m kT is the number of athletes competing in event k = 1, …, K T . We evaluate ρ W at each iteration of the posterior draw. To estimate ρ ̂ k we first evaluate θ iT at desired time T, for each player that appears in event k using the posterior draws of the trajectory model parameters. We then draw the latent performance Y iT from the Gumbel distribution with location θ iT . We can then calculate the Spearman rank correlation between the true event results and the sampled latent variables to obtain ρ ̂ k .

Table 2 shows the evaluation for the women’s luge example in several settings. First, we remove only the final year of results for each athlete and fit the growth-curve model. We use this model built with missing data to predict the results for matches in the two out of sample observed periods of that year.[1] We then remove the final 2 years of results to fit the data and predict the results for matches in the four observed periods across both years. Figure 2 shows the estimated weighted Spearman correlation and 95 % credible interval across the four periods within the 2 years of withheld information. Notice that the correlation does not drop, even evaluated out greater than four periods, showing that our methods are not just valuable for current predictions, but can extrapolate far beyond the end of the observed data.

Table 2:

Estimated weighted Spearman correlation for predictions, removing the last 1 year and last 2 years. Predictions are made in each of the two periods with observations per year to calculate the estimated weighted Spearman correlation.

Projection interval Prediction time interval GH model
1 year 2 years
7–9 months 0.651 0.599 0.786
10–12 months 0.639 0.584 0.639
19–21 months 0.591 0.728
22–24 months 0.600 0.426
Figure 2: 
The estimate of the weighted Spearman correlation and 95 % CI. The square points show the performance of the GH model.
Figure 2:

The estimate of the weighted Spearman correlation and 95 % CI. The square points show the performance of the GH model.

We compare the Spearman correlation of our trajectory model to that of the GH model calculated on the same data. Table 2 shows the Spearman correlation on the test data with 2 years (four periods) held out. The Spearman correlation from using the GH model is shown by the gold boxes in Figure 2). Our model performs similarly across all predictions, even up to 8 periods where the GH model has more variation from period to period as well as has a drop in performance in the last two periods compared to the first two. This highlights the advantage of using the trajectory model with Plackett–Luce, which allows us to include not only the estimated performance at a given time, but how we predict that might change as time progresses.

4.2 Prediction summaries

Figure 3 shows the fitted player strength parameters with 95 % credible intervals for four of the top athletes in luge: Alex Gough (CAN), Natalie Geisenberger (GER), Summer Britcher (USA) and Tatjana Hüfner (GER). These four athletes started competing at different dates so they were at different stages of their career. Hüfner, the oldest, was beginning to decline in performance, whereas Britcher was the youngest athlete and had a sharp increase in performance over the past 3 years. Geisenberger has had a steady career and always performed at the top, whereas Gough took several years to reach a similar strength.

Figure 3: 
Growth-curve model fit with 95 % credible intervals to the observed time range for four of the top women’s luge athletes. Plotting the trajectory models against one another lets us compare the different levels of performance between the athletes at a given time.
Figure 3:

Growth-curve model fit with 95 % credible intervals to the observed time range for four of the top women’s luge athletes. Plotting the trajectory models against one another lets us compare the different levels of performance between the athletes at a given time.

Figure 4 shows the same trajectories but aligned by years since the start of each athlete’s career tt0i. This view allows us to compare trajectories between athletes while they are at the same points in their career. As we can see, the top luge athletes share similarly shaped trajectories. One point of interest for coaches or team managers would be comparing younger players’ career trajectories to the trajectories of their more established counterparts. In Figure 5, we show Britcher’s career projected out 7 years compared to Geisenberger’s current career trajectory. By plotting the trajectories with their respective 95 % credible intervals side-by-side, aligned at the beginning of their careers, we can see that they have similarly shaped trajectories, but that Britcher never reaches the level of performance of Geisenberger.

Figure 4: 
The same trajectory model fits with 95 % credible intervals as in Figure 3 but aligned at the beginning of the career.
Figure 4:

The same trajectory model fits with 95 % credible intervals as in Figure 3 but aligned at the beginning of the career.

Figure 5: 
Career trajectory and 95 % credible interval of Natalie Geisenberger next to the projected career trajectory and 95 % credible interval of Summer Britcher aligned at the start of their careers. The gray vertical line shows the current point, whereas to the right is the predicted strength for Britcher.
Figure 5:

Career trajectory and 95 % credible interval of Natalie Geisenberger next to the projected career trajectory and 95 % credible interval of Summer Britcher aligned at the start of their careers. The gray vertical line shows the current point, whereas to the right is the predicted strength for Britcher.

We can also use our model fit to understand the general trajectory of athletes in a particular sport. To do so, we take advantage of the hierarchical structure of the polynomial coefficients and construct a curve using the estimated means,

(8) θ ̄ = η 0 + η 1 t 1 * + + η p t p * e ω ( t 1 ) .

In the women’s luge case we incorporate the average starting age across all players z ̄ ,

(9) θ ̄ = η 0 + η 1 t 1 * + η 2 t 2 * + η 3 t 3 * e ( ω 0 + ω 1 z ̄ ) ( t 1 ) .
Figure 6 compares the mean curve between women’s luge and a second example, men’s slalom skiing. The curve tells us that the average luge athlete is expected to increase their performance from the start of their career for about 5 years, at which time the athlete reaches a plateau. This trajectory is different from that of men’s slalom athletes, who are expected to begin declining in performance after 6–7 years. This difference may be because slalom skiing is a more-physical sport; therefore, longer careers are less likely. This contrasts luge, which is a more-skilled sport where age is a less important factor in performance. Although this curve cannot explain such phenomena, it is still useful in comparing the common shape of the trajectory between the two sports.
Figure 6: 
The mean curves for women’s luge (left) and men’s slalom (right). By looking at the mean curves, we can compare the typical trajectory of athletes between two sports. Note that the y-axis differs between the two images.
Figure 6:

The mean curves for women’s luge (left) and men’s slalom (right). By looking at the mean curves, we can compare the typical trajectory of athletes between two sports. Note that the y-axis differs between the two images.

4.3 Clustering career trajectories

In Section 4, we compare sports by estimating the mean trajectory model. Here, we perform an exploratory analysis to understand how athletes share similarities to one another. An exploratory way of finding similarities is to cluster career trajectories. Bartolucci and Murphy (2015) fits a finite mixture model on runners in a 24-hour-long endurance race with the intent of identifying strategies amongst clusters of runners based on their performance. Such an analysis can be useful in finding athletes with a potentially similar style or technique, or help understand what makes athletes more successful during a certain career stage. To enable the exploration of the answers to these questions, we cluster the estimated career trajectories of the athletes. Instead of clustering by the predicted rating, we cluster the estimated trajectory model parameters. Clustering based on estimated trajectory model parameters has the advantage of being able to describe the athlete’s entire career trajectory, rather than just the observed years.

Many techniques for clustering trajectories can be used, including two-stage, nonparametric, and model-based methods (Jacques and Preda 2014). In particular, we focus on two-stage methods which first reduce the dimensionality of the data by providing a functional basis, and then use a non-parametric clustering approach to cluster the observations using this simplified basis. Abraham et al. (2003) first fits B-splines to the longitudinal data set and uses k-means clustering to the corresponding parameters. In an application to sports, Miller and Bornn (2017) cluster segments of the National Basketball Association athletes’ movements during possession by mapping them to a set of characteristic actions.

Instead of clustering against point estimates of the parameters, we want to utilize the entire distribution of these parameters to retain the uncertainty of the estimates. The following definition of the Wasserstein distance is a good metric for measuring the distance between two empirical distributions, which in our case are samples from the posterior. For simplicity, we use the distance between the marginal coefficients, although we recognize we ignore correlation. If we define B ib as the empirical distribution of the posterior of β ib , for athlete i and B jb to be the equally sized empirical distribution of β jb for athlete j then the distance between the two distributions is defined as a function of the ordered posterior draws,

(10) W 2 ( B i b , B j b ) = k = 1 n β i b ( k ) β j b ( k ) 2 1 / 2 ,

where β(k) indicates the kth ordered draw. The total distance between two athletes is defined as

(11) d i j = p = 1 P W 2 ( B i p , B j p ) ,

where P is the order selected through model selection. The β ib are centered by η p and standardized by σ β b . The intercept terms β0, α are ignored since we are more interested in the shape of the trajectories in this clustering exercise rather than the value of the rating.

We perform agglomerative hierarchical clustering using the distance matrix created via the Wasserstein distance (Ward 1963). Hierarchical clustering methods are advantageous because they enable us to perform clustering before determining the number of partitions. We use the average silhouette method to determine the number of clusters (Rousseeuw 1987). The optimal number of clusters for the women’s luge trajectories is two clusters, but we will investigate the second most optimal number of clusters, four clusters. Table 3 records the number of athletes within each cluster for both clustering schemes. Details about the clustering methods and finding the optimal number of clusters can be found in Appendix B.

Table 3:

The distribution of athlete’s across clusters for a cut in the hierarchical clustering to create six clusters.

Cluster number Athletes
1 91
2 11
3 42
4 22

Figure 7 shows the mean and interquartile range (IQR) of the estimates of the career trajectories within each cluster at each period, taken from the fitted trajectory models from all men’s slalom athletes. Immediately, we can see the separation between groups of athletes by simply clustering on three parameters. The clustering method appears to divide the trajectories into athletes with mostly increasing ratings (cluster 1, 3), athletes with ratings that increase and then decrease (cluster 2), and athletes that remain the same or increase slightly (cluster 4). Clusters 1 and 3 contain athletes whose ratings increase steadily throughout their career, yet cluster 3 appears to have athletes who increase at a quicker rate. The athletes in cluster 2 at first increase as quickly as those in cluster 3, but then at some point begin to level off or decrease in rating. Cluster 4 appears to have athletes who have a lower rating than athletes in the other clusters and don’t change much over time.

Figure 7: 
Mean and IQR of the estimated career trajectory for men’s slalom athletes within each of the four clusters.
Figure 7:

Mean and IQR of the estimated career trajectory for men’s slalom athletes within each of the four clusters.

Figure 8 shows the section of the dendrogram and corresponding projected career trajectories for the athletes in cluster 2. The trajectories feature a similar shape, increasing ratings in the earlier part of their career and then leveling off or decreasing in rating after about 5 years. We can use the dendrogram to find athletes who are more alike by following the branches from the top (larger distances) to the bottom (smaller distances). For example, Manzenreiter and Oberstolz-Antonova are the last branch in the dendrogram and have roughly the same trajectory shape. Hüfner was the final athlete to be included in the cluster, and has a more dramatic upward trajectory in the first 5 years and downward trajectory in the second 5 years than other athletes in cluster 2. Hüfner and Britcher, whose trajectories we compared in Section 4.2 are both assigned to this cluster and show similar shape.

Figure 8: 
The section of the dendrogram representing the path of the clustering and the distance between each of women’s luge athletes within cluster 2 (top). The estimated trajectories and 95 % CI for the members of cluster 2 (bottom).
Figure 8:

The section of the dendrogram representing the path of the clustering and the distance between each of women’s luge athletes within cluster 2 (top). The estimated trajectories and 95 % CI for the members of cluster 2 (bottom).

5 Conclusions

The approach presented in this paper successfully models an athlete’s strength in multi-competitor sports, allowing the user to compare current and future athlete performance. The trajectory model was constructed to be as flexible as needed to fit even the most irregular career trajectories. Because there are no sport-specific assumptions made in the creation of the Plackett–Luce likelihood or trajectory model, this method is generalizable to any sport.

We recognize that since we only observe athletes while they perform in professional tournaments, our method is susceptible to some selection bias since we do not include those athletes who do not qualify for every event or retire early. This likely causes bias in our predictions of the athlete strength. It may also cause an underestimation of uncertainty over long-term predictions. Alternatives, such as removing the game results in the final periods in which an athlete competes, can help mitigate this “survivor” bias, but it comes at the cost of reducing the information in the data.

Our approach is intended to be applied to a single sport at a time, but by comparing the distributions of the random effects (α, β) and the stabilization factor (ω) we can make comparisons between sports. We also introduce the mean curve, which is an estimate of the approximate shape of a career trajectory for a typical athlete in the sport. The characteristic mean curves could also be used to compare the typical career trajectory across sports for insight about when we expect players to reach their peak performance, on average. This estimate is particularly useful for athletes who have few observed appearances in games or matches, providing little information about their career trajectory.

We present two applications of this trajectory model. The first is an exploratory extension by clustering the trajectories across different athletes. Clustering athlete trajectories based on their trajectory models is easier than clustering on raw ratings and can give us information about an athlete’s entire career trajectory rather than only from observed periods. We also use the mean curve as a method to quickly compare trajectories across sports. Merely looking at plots of the mean curve can reinforce qualitative comparisons based on what we know about the sport, such as the physicality of the sport, the typical age of athletes, and other similar characteristics. We choose a two step method to quickly cluster the fitted trajectory models, but more work could be done to incorporate latent groups into the mixture model itself.


Corresponding author: Katy McKeough, Boston Red Sox, 4 Jersey Street, Boston, MA 02215, USA, E-mail:

Funding source: United States Olympic & Paralympic Committee

  1. Research ethics: Not applicable.

  2. Author contributions: The authors haveaccepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: All other authors state no conflict of interest.

  4. Research funding: United States Olympic&Paralympic Committee.

  5. Data availability: Not applicable.

Appendix A: Orthogonal polynomials

Understanding the creation of the orthogonal polynomials is important because the process must be replicated to project new time points into the orthogonal space. We start with a sequence of discretized time points t = {0, 1, …, T}. We want to create a orthogonal basis of t , t 2, …, t p . We will denote the orthonormal basis as t 1 * , t 0 * , t 1 * , , t p * and the corresponding orthogonal basis as u −1, u 0, u 1, …, u p where t i * = u i / u 1 . This can be done by the following steps:

  1. Center all elements of t by the mean t ̄ :

    t = t t ̄ ; t ̄ = 1 T + 1 i = 0 T i
  2. Set basis vectors u −1 = 0, u 0 = 1, u 1 = t ′ which are trivially orthogonal.

  3. Define i =< u i , u i >, α i = < u i t , t > i . Note that the L2-norm is u i = i .

  4. For i = 2, …, p we can now solve for the orthogonal polynomial components using the Gram Schmidt process on increasing orders of time, rewritten using a recursion relation.

    i t i * = u i = t u i 1 j = 0 i 1 < t u i 1 , u j > j u j = t α i 1 u i 1 i 1 i 2 u i 2

Ignoring t 1 * and t 0 * , we now have a set of orthogonal polynomials t 1 * t p * up to time T. The model is fit using these orthogonal polynomials. In practice, the orthogonal polynomial basis was created using the poly function in R, which also returns all i and α i coefficients.

In order to make projections to include future time periods s = {0, 1, …, T, T + 1, T + 2, …} we must project these values into the same orthogonal space we use to fit the parameters in our model. To do so, we keep the same coefficients, i and α i , and use them to project the time points s into the same space using the equation in step 4s. Explicitly, if we created our orthogonal basis using t i * ( 0,1 , , T ) ; i = 1 , , p , but we want to extend it to t i * = u i = t i * ( 0,1 , , T ) , t i * ( T + 1 , T + 2 , ) then we modify step 4 to be:

i t i * = ( ( s t ̄ ) α i 1 ) u i 1 i 1 i 2 u i 2 ,

where i , α i , and t ̄ are calculated only using t i * ( 0,1 , , T ) . Note that the new polynomial vectors are not themselves orthonormal, but are projected into the same space we fit the coefficients.

Appendix B: Hierarchical clustering

Agglomerative hierarchical clustering begins with all observations as separate clusters, and iteratively combines observations and clusters in the order of smallest to largest distance. We choose a type of linkage or method to merge clusters one at a time, in this case we choose the Ward’s method. Hierarchical clustering will always merge the observations or clusters with the smallest distance as determined by Ward’s method at each step. A way of visualizing this process is looking at the partial dendrogram in Figure 8, where the branches represent at what distance (y-axis) each cluster or observation was merged. Ward’s method provides a decision metric based on the distance between two clusters and the noise within each cluster. This method tends to make compact clusters and perform well when clusters are noisy. The Ward’s metric is between two clusters C a and C b is

(12) Δ ( C a , C b ) = i C a j C b d i j ,

where d ij is the distance between athletes i and j. The main downside to this type of clustering is that it is greedy: once a cluster is merged together, it will never be separated.

Using the silhouette statistic is a good way to evaluate cluster fit. The silhouette statistic is calculated for each observation and measures how well the observation fits within its labeled cluster, and how far away it is from the next closest cluster. Mathematically, this concept is written as:

(13) s i = b i a i max ( a i , b i ) ,

where a i is the average distance between an observation and other observations in its assigned cluster and b i is the average distance between an observation and other observations in the next closest cluster. If s i is close to 1, then the observation is thought to be well assigned and if it is negative it is likely incorrectly assigned. The silhouette statistic can also be used in selecting the optimal number of clusters. If we define

s ̄ k = 1 n i n s i

then the optimal number of clusters would be max k s ̄ k . Figure 9 shows the s i for 2 through 15 clusters. The optimal number of clusters is 2 and the second most optimal is 4.

Figure 9: 
Average silhouette values for number of clusters 2 through 15. The maximum average silhouette occurs with well-defined clusters, denoted by the red dotted line.
Figure 9:

Average silhouette values for number of clusters 2 through 15. The maximum average silhouette occurs with well-defined clusters, denoted by the red dotted line.

References

Abraham, C., P. A. Cornillon, E. Matzner-Løber, and N. Molinari. 2003. “Unsupervised Curve Clustering Using B-Splines.” Scandinavian Journal of Statistics 30: 581–95. https://doi.org/10.1111/1467-9469.00350.Search in Google Scholar

Aicher, C., Y.-A. Ma, N. J. Foti, and E. B. Fox. 2019. “Stochastic Gradient MCMC for State Space Models.” SIAM Journal on Mathematics of Data Science 1: 555–87. https://doi.org/10.1137/18m1214780.Search in Google Scholar

Alqallaf, F., and P. Gustafson. 2001. “On Cross-Validation of Bayesian Models.” Canadian Journal of Statistics 29: 333–40. https://doi.org/10.2307/3316081.Search in Google Scholar

Baker, R. D., and I. G. McHale. 2013. “Forecasting Exact Scores in National Football League Games.” International Journal of Forecasting 29: 122–30. https://doi.org/10.1016/j.ijforecast.2012.07.002.Search in Google Scholar

Bansal, P., R. Krueger, M. Bierlaire, R. A. Daziano, and T. H. Rashidi. 2020. “Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations.” Transportation Research Part B: Methodological 131: 124–42. https://doi.org/10.1016/j.trb.2019.12.001.Search in Google Scholar

Bartolucci, F., and T. B. Murphy. 2015. “A Finite Mixture Latent Trajectory Model for Modeling Ultrarunners’ Behavior in a 24-hour Race.” Journal of Quantitative Analysis in Sports 11: 193–203. https://doi.org/10.1515/jqas-2014-0060.Search in Google Scholar

Bell, A., J. Smith, C. E. Sabel, and K. Jones. 2016. “Formula for Success: Multilevel Modelling of Formula One Driver and Constructor Performance, 1950–2014.” Journal of Quantitative Analysis in Sports 12: 99–112. https://doi.org/10.1515/jqas-2015-0050.Search in Google Scholar

Bradley, R. A., and M. E. Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons.” Biometrika 39: 324–45. https://doi.org/10.2307/2334029.Search in Google Scholar

Bradlow, E. T., and P. S. Fader. 2001. “A Bayesian Lifetime Model for the Hot 100 Billboard Songs.” Journal of the American Statistical Association 96: 368–81. https://doi.org/10.1198/016214501753168091.Search in Google Scholar

Brander, J. A., E. J. Egan, and L. Yeung. 2014. “Estimating the Effects of Age on NHL Player Performance.” Journal of Quantitative Analysis in Sports 10: 241–59. https://doi.org/10.1515/jqas-2013-0085.Search in Google Scholar

Caron, F., and Y. W. Teh. 2012. “Bayesian Nonparametric Models for Ranked Data.” Advances in Neural Information Processing Systems 2: 1520–8.Search in Google Scholar

Chambers, J., and T. Hastie. 1992. Statistical Models in S, 1st ed. London: CRC Press.Search in Google Scholar

Chib, S., E. Greenberg, and Y. Chen. 1998. “MCMC Methods for Fitting and Comparing Multinomial Response Models.” Econometrics 9802001: 1–28.Search in Google Scholar

Gelman, A., and D. B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7: 457–511. https://doi.org/10.1214/ss/1177011136.Search in Google Scholar

Gelman, A., J. Carlin, H. Stern, D. Dunson, A. Vehtari, and D. Rubin. 2013. Bayesian Data Analysis. New York: Chapman and Hall/CRC.10.1201/b16018Search in Google Scholar

Glickman, M. E., and J. Hennessy. 2015. “A Stochastic Rank Ordered Logit Model for Rating Multi-Competitor Games and Sports.” Journal of Quantitative Analysis in Sports 11: 131–44. https://doi.org/10.1515/jqas-2015-0012.Search in Google Scholar

Glynn, C., and S. T. Tokdar. 2017. “A Switching Dynamic Generalized Linear Model to Detect Abnormal Performances in Major League Baseball.” In MIT Sloan Sports Analytics Conference, 1–29.10.4324/9781351148641-2Search in Google Scholar

Haario, H., E. Saksman, and J. Tamminen. 1999. “Adaptive Proposal Distribution for Random Walk Metropolis Algorithm.” Computational Statistics 14: 375–95. https://doi.org/10.1007/s001800050022.Search in Google Scholar

Hoffman, M. D., and A. Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15: 1593–623.Search in Google Scholar

Jacques, J., and C. Preda. 2014. “Functional Data Clustering: A Survey.” Advances in Data Analysis and Classification 8: 231–55. https://doi.org/10.1007/s11634-013-0158-y.Search in Google Scholar

Jensen, S. T., B. B. McShane, and A. J. Wyner. 2009. “Hierarchical Bayesian Modeling of Hitting Performance in Baseball.” Bayesian Analysis 4: 631–52. https://doi.org/10.1214/09-ba424.Search in Google Scholar

Luce, R. D. R. D. 1959. Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.Search in Google Scholar

Luo, Y., K. Al-Harbi, Y. Luo, and K. Al-Harbi. 2017. “Performances of LOO and WAIC as IRT Model Selection Methods.” Psychological Test and Assessment Modeling 59: 183–205. https://doi.org/10.1186/s12886-017-0580-6.Search in Google Scholar PubMed PubMed Central

Malcata, R. M., W. G. Hopkins, and S. N. Pearson. 2014. “Tracking Career Performance of Successful Triathletes.” Medicine & Science in Sports & Exercise 46: 1227–34. https://doi.org/10.1249/mss.0000000000000221.Search in Google Scholar PubMed

McFadden, D., and K. Train. 2000. “Mixed MNL Models for Discrete Response.” Journal of Applied Econometrics 15: 447–70. https://doi.org/10.1002/1099-1255(200009/10)15:5<447::aid-jae570>3.0.co;2-1.10.1002/1099-1255(200009/10)15:5<447::AID-JAE570>3.0.CO;2-1Search in Google Scholar

Miller, A. C., and L. Bornn. 2017. “Possession Sketches: Mapping NBA Strategies.” In Proc. 11th Annual MIT Sloan Sports Analytics Conference, 1–12.Search in Google Scholar

Moudud, A., K. Carling, R. Chen, and Y. Liang. 2008. “How to Determine the Progression of Young Skiers?” Chance 21: 13–9, https://doi.org/10.1080/09332480.2008.10722927.Search in Google Scholar

Neal, R. M. 2012. “MCMC Using Hamiltonian Dynamics.” Handbook of Markov Chain Monte Carlo, https://doi.org/10.1201/b10905-6.Search in Google Scholar

Panik, M. J. 2014. Growth Curve Modeling: Theory and Applications. Hoboken: John Wiley & Sons, Inc.10.1002/9781118763971Search in Google Scholar

Piironen, J., and A. Vehtari. 2017. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27: 711–35. https://doi.org/10.1007/s11222-016-9649-y.Search in Google Scholar

Pinheiro, J. C., and D. M. Bates. 2000. Mixed-Effects Models in S and S-PLUS. New York: Springer.10.1007/978-1-4419-0318-1Search in Google Scholar

Plackett, R. 1975. “The Analysis of Permutations.” Applied Statistics 24: 193–202. https://doi.org/10.2307/2346567.Search in Google Scholar

Rousseeuw, P. J. 1987. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics 20: 53–65. https://doi.org/10.1016/0377-0427(87)90125-7.Search in Google Scholar

Spearman, C. 1904. “The Proof and Measurement of Association Between Two Things.” American Journal of Psychology 100: 441–71. https://doi.org/10.2307/1422689.Search in Google Scholar

Vehtari, A., A. Gelman, and J. Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and Waic.” Statistics and Computing 27: 1413–32. https://doi.org/10.1007/s11222-016-9696-4.Search in Google Scholar

Ward, J. H. 1963. “Hierarchical Grouping to Optimize an Objective Function.” Journal of the American Statistical Association 58: 236–44. https://doi.org/10.1080/01621459.1963.10500845.Search in Google Scholar

Received: 2021-04-05
Accepted: 2023-09-19
Published Online: 2023-11-01
Published in Print: 2024-03-25

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 4.5.2024 from https://www.degruyter.com/document/doi/10.1515/jqas-2021-0034/html
Scroll to top button