Variable Importance Without Impossible Data

Masayoshi Mase; Art B. Owen; Benjamin B. Seiler

doi:10.1146/annurev-statistics-040722-045325

Annual Review of Statistics and Its Application

Volume 11, 2024

Review Article

Open Access

Variable Importance Without Impossible Data

Masayoshi Mase¹, Art B. Owen², and Benjamin B. Seiler²
View Affiliations Hide Affiliations

Affiliations: ¹Research and Development Group, Hitachi, Ltd., Kokubunji, Tokyo, Japan; email: [email protected] ²Department of Statistics, Stanford University, Stanford, California, USA; email: [email protected], [email protected]
Vol. 11:153-178 (Volume publication date April 2024) https://doi.org/10.1146/annurev-statistics-040722-045325
First published as a Review in Advance on August 25, 2023
Copyright © 2024 by the author(s).

This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third-party material in this article for license information

Abstract

The most popular methods for measuring importance of the variables in a black-box prediction algorithm make use of synthetic inputs that combine predictor variables from multiple observations. These inputs can be unlikely, physically impossible, or even logically impossible. As a result, the predictions for such cases can be based on data very unlike any the black box was trained on. We think that users cannot trust an explanation of the decision of a prediction algorithm when the explanation uses such values. Instead, we advocate a method called cohort Shapley, which is grounded in economic game theory and uses only actually observed data to quantify variable importance. Cohort Shapley works by narrowing the cohort of observations judged to be similar to a target observation on one or more features. We illustrate it on an algorithmic fairness problem where it is essential to attribute importance to protected variables that the model was not trained on.

Keyword(s): algorithmic fairness, sensitivity analysis, Shapley value

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040722-045325

2024-04-22

2024-05-03

Full text loading...

/deliver/fulltext/statistics/11/1/annurev-statistics-040722-045325.html?itemId=/content/journals/10.1146/annurev-statistics-040722-045325&mimeType=html&fmt=ahah

Literature Cited

Aas K, Jullum M, Løland A. 2019.. Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. . arXiv:1903.10464 [stat.ML]
Agarwal R, Melnick L, Frosst N, Zhang X, Lengerich B, et al. 2021.. Neural additive models: interpretable machine learning with neural nets. . In 35th Conference on Neural Information Processing Systems (NeurIPS 2021), ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 4699–711 Red Hook, NY:: Curran
[Google Scholar]
Angwin J, Larson J, Mattu S, Kirchner L. 2016.. Machine bias: there's software used across the country to predict future criminals. And it's biased against blacks. . ProPublica, May 23. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[Google Scholar]
Aumann RJ, Shapley LS. 1974.. Values of Non-Atomic Games. Princeton, NJ:: Princeton Univ. Press
Berk R, Heidari H, Jabbari S, Kearns M, Roth A. 2018.. Fairness in criminal justice risk assessments: The state of the art. . Sociol. Methods Res. 50:(1):3–44
[Crossref] [Google Scholar]
Berman R. 2018.. Beyond the last touch: Attribution in online advertising. . Mark. Sci. 37:(5):771–92
[Crossref] [Google Scholar]
Bollen KA, Pearl J. 2013.. Eight myths about causality and structural equation models. . In Handbook of Causal Analysis for Social Research, ed. SL Morgan , pp. 301–28 Dordrecht, Neth:.: Springer
[Google Scholar]
Breiman L. 2001.. Random forests. . Mach. Learn. 45:(1):5–32
[Crossref] [Google Scholar]
Brennan T, Dieterich W, Ehret B. 2009.. Evaluating the predictive validity of the COMPAS risk and needs assessment system. . Crim. Justice Behav. 36:(1):21–40
[Crossref] [Google Scholar]
Campolongo F, Cariboni J, Saltelli A. 2007.. An effective screening design for sensitivity analysis of large models. . Environ. Model. Softw. 22:(10):1509–18
[Crossref] [Google Scholar]
Chan D, Perry M. 2017.. Challenges and opportunities in media mix modeling. Work. Pap. , Google Inc., Mountain View, CA.: https://research.google.com/pubs/archive/45998.pdf
Chastaing G, Gamboa F, Prieur C. 2012.. Generalized Hoeffding-Sobol decomposition for dependent variables-application to sensitivity analysis. . Electron. J. Stat. 6::2420–48
[Crossref] [Google Scholar]
Chouldechova A. 2017.. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. . Big Data 5:(2):153–63
[Crossref] [Google Scholar]
Chouldechova A, Roth A. 2018.. The frontiers of fairness in machine learning. . arXiv:1810.08810 [cs.LG]
Cochran WG. 1968.. The effectiveness of adjustment by subclassification in removing bias in observational studies. . Biometrics 24:(2):295–313
[Crossref] [Google Scholar]
Corbett-Davies S, Goel S. 2018.. The measure and mismeasure of fairness: a critical review of fair machine learning. . arXiv:1808.00023 [cs.CY]
Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A. 2017.. Algorithmic decision making and the cost of fairness. . In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 New York:: ACM
[Google Scholar]
Cox DR. 1984.. Interaction. . Int. Stat. Rev. 52:(1):1–24
[Crossref] [Google Scholar]
Da Veiga S, Gamboa F, Iooss B, Prieur C. 2021.. Basics and Trends in Sensitivity Analysis: Theory and Practice in R. Philadelphia, PA:: SIAM
Datta A, Sen S, Zick Y. 2016.. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. . In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617 Los Alamitos, CA:: IEEE
[Google Scholar]
Dawid AP, Musio M. 2021.. Effects of causes and causes of effects. . Annu. Rev. Stat. Appl. 9::261–87
[Crossref] [Google Scholar]
De Gonzalez AB, Cox DR. 2007.. Interpretation of interaction: a review. . Ann. Appl. Stat. 1:(2):371–85
[Google Scholar]
Dieterich W, Mendoza C, Brennan T. 2016.. COMPAS risk scales: demonstrating accuracy equity and predictive parity. Tech. rep. , Northpointe Inc., Traverse City, MI:
Donoho DL. 2019.. What's missing from today's machine intelligence juggernaut?. Harv. Data Sci. Rev. 2019:(1.1). https://doi.org/10.1162/99608f92.c698b3a7
[Google Scholar]
Doshi-Velez F, Kortz M, Budish R, Bavitz C, Gershman S, et al. 2017.. Accountability of AI under the law: the role of explanation. . arXiv:1711.01134 [cs.AI]
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R. 2012.. Fairness through awareness. . In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–26 New York:: ACM
[Google Scholar]
Efron B, Stein C. 1981.. The jackknife estimate of variance. . Ann. Stat. 9:(3):586–96
[Crossref] [Google Scholar]
Fisher RA, Mackenzie WA. 1923.. Studies in crop variation. II. The manurial response of different potato varieties. . J. Agric. Sci. 13::311–20
[Crossref] [Google Scholar]
Flores AW, Bechtel K, Lowenkamp CT. 2016.. False positives, false negatives, and false analyses: A rejoinder to “Machine bias: There's software used across the country to predict future criminals. and it's biased against blacks. Fed. Probation J. 80:(2):38–46
[Google Scholar]
Friedler SA, Scheidegger C, Venkatasubramanian S, Choudhary S, Hamilton EP, Roth D. 2019.. A comparative study of fairness-enhancing interventions in machine learning. . In FAT^* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 329–38 New York:: ACM
[Google Scholar]
Frye C, de Mijolla D, Begley T, Cowton L, Stanley M, Feige I. 2021.. Shapley explainability on the data manifold. . In Proceedings of the 10th International Conference on Learning Representations (ICLR 2021). https://openreview.net/pdf?id=OPyWRrcjVQw
[Google Scholar]
Gelman A, Park DK. 2009.. Splitting a predictor at the upper quarter or third and the lower quarter or third. . Am. Stat. 63:(1):1–8
[Crossref] [Google Scholar]
Ghorbani A, Zou J. 2019.. Data Shapley: equitable valuation of data for machine learning. . Proc. Mach. Learn. Res. 97::2242–51
[Google Scholar]
Harrison D, Rubinfeld DL. 1978.. Hedonic prices and the demand for clean air. . J. Environ. Econ. Manag. 5::81–102
[Crossref] [Google Scholar]
Hoeffding W. 1948.. A class of statistics with asymptotically normal distribution. . Ann. Math. Stat. 19::293–325
[Crossref] [Google Scholar]
Holland PW. 1986.. Statistics and causal inference. . J. Am. Stat. Assoc. 81:(396):945–60
[Crossref] [Google Scholar]
Holland PW. 1988.. Causal inference, path analysis and recursive structural equations models. . Sociol. Methodol. 18::449–84
[Crossref] [Google Scholar]
Hooker G. 2012.. Generalized functional ANOVA diagnostics for high-dimensional functions of dependent variables. . J. Comput. Graph. Stat. 16:(3):709–32
[Crossref] [Google Scholar]
Hooker G, Mentch L. 2019.. Please stop permuting features: an explanation and alternatives. . arXiv:1905.03151v1 [stat.ME]
Hooker S, Erhan D, Kindermans PJ, Kim B. 2019.. A benchmark for interpretability methods in deep neural networks. . In 33rd Conference on Neural Information Processing Systems (NeurIPS2019), ed. H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett , pp. 9737–48 Red Hook, NY:: Curran
[Google Scholar]
Jackson E, Mendoza C. 2020.. Setting the record straight: what the COMPAS core risk and need assessment is and is not. . Harv. Data Sci. Rev. 2020:(2.1). https://doi.org/10.1162/99608f92.1b3dadaa
[Google Scholar]
Jansen MJW. 1999.. Analysis of variance designs for model output. . Comput. Phys. Commun. 117:(1–2):35–43
[Crossref] [Google Scholar]
Jiang T, Owen AB. 2003.. Quasi-regression with shrinkage. . Math. Comput. Simul. 62:(3–6):231–41
[Crossref] [Google Scholar]
Kleinberg J, Mullainathan S, Raghavan M. 2016.. Inherent trade-offs in the fair determination of risk scores. . arXiv:1609.05807 [cs.LG]
Kumar IE, Venkatasubramanian S, Scheidegger C, Friedler S. 2020.. Problems with Shapley-value-based explanations as feature importance measures. . In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), ed. H Daumé, A Singh , pp. 5491–500 Brookline, MA:: Microtome
[Google Scholar]
Lindeman RH, Merenda PF, Gold RZ. 1980.. Introduction to Bivariate and Multivariate Analysis. Homewood, IL:: Scott, Foresman and Co.
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, et al. 2020.. From local explanations to global understanding with explainable AI for trees. . Nat. Mach. Intell. 2:(1):5667
[Crossref] [Google Scholar]
Lundberg SM, Lee SI. 2017.. A unified approach to interpreting model predictions. . In Advances in Neural Information Processing Systems 30 (NIPS 2017), ed. U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus , pp. 4765–74 Red Hook, NY:: Curran
[Google Scholar]
Mase M, Owen AB, Seiler BB. 2019.. Explaining black box decisions by Shapley cohort refinement. . arXiv:1911.00467 [cs.LG]
Mase M, Owen AB, Seiler BB. 2021.. Cohort Shapley value for algorithmic fairness. . arXiv:2105.07168 [cs.LG]
Michalak TP, Aadithya KV, Szczepanski PL, Ravindran B, Jennings NR. 2013.. Efficient computation of the Shapley value for game-theoretic network centrality. . J. Artif. Intell. Res. 46::607–50
[Crossref] [Google Scholar]
Mill JS. 1851.. A System of Logic, Ratiocinative and Inductive. London:: John Parker, 3rd ed.
Mitchell R, Cooper J, Frank E, Holmes G. 2022.. Sampling permutations for Shapley value estimation. . J. Mach. Learn. Res. 23::1–46
[Google Scholar]
Moehle N, Boyd S, Ang A. 2021.. Portfolio performance attribution via Shapley value. . arXiv:2102.05799 [q-fin.CP]
Molnar C. 2018.. Interpretable Machine Learning: A Guide For Making Black Box Models Explainable. Victoria, Can.:: Leanpub
Morris MD. 1991.. Factorial sampling plans for preliminary computational experiments. . Technometrics 33:(2):161–74
[Crossref] [Google Scholar]
Neuhäuser M, Thielmann M, Ruxton GD. 2018.. The number of strata in propensity score stratification for a binary outcome. . Arch. Med. Sci. 14:(3):695–700
[Crossref] [Google Scholar]
Newton MA, Raftery AE. 1994.. Approximate Bayesian inference with the weighted likelihood bootstrap. . J. R. Stat. Soc. Ser. B 56:(1):3–26
[Crossref] [Google Scholar]
Northpointe. 2019.. Practitioner's guide to COMPAS core. Tech. Doc. , Northpointe Inc., Traverse City, MI:
Owen AB. 1998.. Latin supercube sampling for very high dimensional simulations. . ACM Trans. Model. Comput. Simul. 8:(2):71–102
[Crossref] [Google Scholar]
Owen AB. 2014.. Sobol' indices and Shapley value. . J. Uncertain. Quantif. 2::245–51
[Crossref] [Google Scholar]
Owen AB, Prieur C. 2017.. On Shapley value for measuring importance of dependent inputs. . J. Uncertain. Quantif. 5:(1):986–1002
[Crossref] [Google Scholar]
Pearl J. 2009.. Causal inference in statistics: an overview. . Stat. Surv. 3::96–146
[Crossref] [Google Scholar]
Plischke E, Rabitti G, Borgonovo E. 2021.. Computing Shapley effects for sensitivity analysis. . J. Uncertain. Quantif. 9:(4):1411–37
[Crossref] [Google Scholar]
Razavi S, Jakeman A, Saltelli A, Prieur C, Iooss B, et al. 2021.. The future of sensitivity analysis: an essential discipline for systems modeling and policy support. . Environ. Model. Softw. 137::104954
[Crossref] [Google Scholar]
Ribeiro MT, Singh S, Guestrin C. 2016.. Why should I trust you? Explaining the predictions of any classifier. . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–44 New York:: ACM
[Google Scholar]
Rubin DB. 1974.. Estimating causal effects of treatments in randomized and nonrandomized studies. . J. Educ. Psychol. 66:(5):688–701
[Crossref] [Google Scholar]
Rubin DB. 1981.. The Bayesian bootstrap. . Ann. Stat. 9:(1):130–34
[Crossref] [Google Scholar]
Rudin C, Wang C, Coker B. 2020.. The age of secrecy and unfairness in recidivism prediction. . Harv. Data Sci. Rev. 2020:(2.1). https://doi.org/10.1162/99608f92.6ed64b30
[Google Scholar]
Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, et al. 2008.. Global Sensitivity Analysis: The Primer. New York:: John Wiley & Sons
Scott SL, Varian HR. 2014.. Predicting the present with Bayesian structural time series. . Int. J. Math. Model. Numer. Optim. 5:(1–2):4–23
[Google Scholar]
Shapley LS. 1953.. A value for n-person games. . In Contribution to the Theory of Games II (Annals of Mathematics Studies 28), ed. HW Kuhn, AW Tucker , pp. 307–17 Princeton, NJ:: Princeton Univ. Press
[Google Scholar]
Slack D, Hilgard S, Jia E, Singh S, Lakkaraju H. 2020.. Fooling LIME and SHAP: adversarial attacks on post hoc explanation methods. . In AIES '20: AAAI/ACM Conference on AI, Ethics, and Society (AIES), pp. 180–86 New York:: ACM
[Google Scholar]
Sobol' IM. 1969.. Multidimensional Quadrature Formulas and Haar Functions. Moscow:: Nauka (In Russian)
Sobol' IM. 1993.. Sensitivity estimates for nonlinear mathematical models. . Math. Model. Comput. Exp. 1::407–14
[Google Scholar]
Song E, Nelson BL, Staum J. 2016.. Shapley effects for global sensitivity analysis: theory and computation. . J. Uncertain. Quantif. 4:(1):1060–83
[Crossref] [Google Scholar]
Stone CJ. 1994.. The use of polynomial splines and their tensor products in multivariate function estimation. . Ann. Stat. 22:(1):118–84
[Google Scholar]
Štrumbelj E, Kononenko I. 2010.. An efficient explanation of individual classifications using game theory. . J. Mach. Learn. Res. 11::1–18
[Google Scholar]
Štrumbelj E, Kononenko I, Šikonja MR. 2009.. Explaining instance classifications with interactions of subsets of feature values. . Data Knowl. Eng. 68:(10):886–904
[Crossref] [Google Scholar]
Sundararajan M, Najmi A. 2020.. The many Shapley values for model explanation. . Proc. Mach. Learn. Res. 119::9269–78
[Google Scholar]
Sundararajan M, Taly A, Yan Q. 2017.. Axiomatic attribution for deep networks. . In ICML'17: Proceedings of the 34th International Conference on Machine Learning, pp. 3319–28 Brookline, MA:: Microtome
[Google Scholar]
Tan S, Caruana R, Hooker G, Lou Y. 2018.. Distill-and-Compare: auditing black-box models using transparent model distillation. . In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 303–10 New York:: ACM
[Google Scholar]
Wei P, Lu Z, Song J. 2015.. Variable importance analysis: a comprehensive review. . Reliab. Eng. Syst. Saf. 142::399–432
[Crossref] [Google Scholar]
Xiang A. 2021.. Reconciling legal and technical approaches to algorithmic bias. . Tenn. Law Rev. 88:(3):649724
[Google Scholar]

/content/journals/10.1146/annurev-statistics-040722-045325

Variable Importance Without Impossible Data

Annual Review of Statistics and Its Application 11, 153 (2024); https://doi.org/10.1146/annurev-statistics-040722-045325

/content/journals/10.1146/annurev-statistics-040722-045325

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 11, 2024

Review Article

Open Access

Variable Importance Without Impossible Data

Abstract

Most Read This Month

Most Cited Most Cited RSS feed