Abstract
Rating data are a kind of ordinal categorical data routinely collected in survey sampling. The response value in such applications is confined to a finite number of ordered categories. Due to population heterogeneity, the respondents may have several different rating styles. A finite mixture model is thus most suitable to fit datasets of this nature. In this paper, we propose a two-component mixture of shifted binomial distributions for rating data. We show that this model is identifiable and propose a numerically stable penalized likelihood approach for parameter estimation. We adapt an expectation-maximization algorithm for the penalized maximum likelihood estimation. Our simulation results show that the penalized maximum likelihood estimator is consistent and effective. We fit the proposed model and other models in the literature to some real-world datasets and find the proposed model can have much better fits.
Similar content being viewed by others
References
Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). Hoboken: John Wiley and Sons.
Atienza, N., Garcia-Heras, J., Munoz-Pichardo, J. M. (2006). A new condition for identifiability of finite mixture distributions. Metrika, 63, 215–221.
Breen, R., Luijkx, R. (2010). Mixture models for ordinal data. Sociological Methods and Research, 39, 3–24.
Chen, H., Chen, J., Kalbfleisch, J. D. (2001). A modified likelihood ratio test for homogeneity in finite mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 19–29.
Chen, J. (1995). Optimal rate of convergence for finite mixture models. The Annals of Statistics, 23, 221–233.
Chen, J. (1998). Penalized likelihood ratio test for finite mixture models with multinomial observations. Canadian Journal of Statistics, 26, 583–599.
Chen, J., Li, P. (2009). Hypothesis test for normal mixture models: The EM approach. The Annals of Statistics, 37, 2523–2542.
D’elia, A., Piccolo, D. (2005). A mixture model for preference data analysis. Computational Statistics Data Analysis, 49, 917–934.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.
Iannario, M. (2010). On the identifiability of a mixture model for ordinal data. Metron, 68, 87–94.
Kiefer, J., Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, 27, 887–906.
Li, P., Chen, J., Marriott, P. (2009). Non-finite Fisher information and homogeneity: The EM approach. Biometrika, 96, 411–426.
Lindsay, B. G. (1995). Mixture models: theory. Geometry and applications. Hayward: Institute for Mathematical Statistics.
McLachlan, G. J., Peel, D. (2000). Finite mixture models. New York: John Wiley and Sons.
Oh, C. (2014). A maximum likelihood estimation method for a mixture of shifted binomial distributions. Journal of the Korean Data and Information Science Society, 25, 255–261.
Piccolo, D. (2003). On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Statistica, 5, 85–104.
Simone, R. (2021). An accelerated EM algorithm for mixture models with uncertainty for rating data. Computational Statistics, 36, 691–714.
Zhou, H., Lange, K. (2009). Rating movies and rating the raters who rate them. The American Statistician, 63, 297–307.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 11701071, 11871419), the Natural Science and Engineering Research Council (Grant No. 2019–04204) and the Scientific Research Projects of Dongbei University of Finance and Economics (Grand No. 20210261). We thank the AE and referee for helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
About this article
Cite this article
Li, S., Chen, J. Mixture of shifted binomial distributions for rating data. Ann Inst Stat Math 75, 833–853 (2023). https://doi.org/10.1007/s10463-023-00865-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-023-00865-7