Hostname: page-component-848d4c4894-wzw2p Total loading time: 0 Render date: 2024-05-03T01:00:26.562Z Has data issue: false hasContentIssue false

Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

Published online by Cambridge University Press:  21 July 2023

Rolando Cavazos-Cadena*
Affiliation:
Universidad Autónoma Agraria Antonio Narro
Hugo Cruz-Suárez*
Affiliation:
Benemérita Universidad Autónoma de Puebla
Raúl Montes-de-Oca*
Affiliation:
Universidad Autónoma Metropolitana-Iztapalapa
*
*Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Boulevard Antonio Narro 1923, Buenavista, COAH 25315, México. Email: rolando.cavazos@uaaan.edu.mx
**Postal address: Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Ave. San Claudio y Río Verde, Col. San Manuel CU, PUE 72570, México. Email: hcs@fcfm.buap.mx
***Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana-Iztapalapa, Av. Ferrocarril San Rafael Atlixco 186, Col. Leyes de Reforma Primera Sección, Alcaldía Iztapalapa, CDMX 09310, México. Email: momr@xanum.uam.mx

Abstract

This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.

Type
Original Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.Google Scholar
Balaji, S. and Meyn, S. P (2000). Multiplicative ergodicity and large deviations for an irreducible Markov chain. Stoch. Process. Appl. 90, 123144.CrossRefGoogle Scholar
Bäuerle, N. and Reider, U. (2011). Markov Decision Processes with Applications to Finance. Springer, New York.CrossRefGoogle Scholar
Billingsley, P. (2012). Probability and Measure. Wiley, New York.Google Scholar
Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision process with monotone cost. Math. Operat. Res. 27, 192209.CrossRefGoogle Scholar
Cavazos-Cadena, R. (2009). Solutions of the average cost optimality equation for finite Markov decision chains: Risk-sensitive and risk-neutral criteria. Math. Meth. Operat. Res. 70, 541566.CrossRefGoogle Scholar
Cavazos-Cadena, R. (2018). Characterization of the optimal risk-sensitive average cost in denumerable Markov decision chains. Math. Operat. Res. 43, 10251050.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive control in communicating average Markov decision chains. In Modelling Uncertainty: An Examination of Stochastic Theory, Methods and Applications, eds. M. Dror, P. L’Ecuyer and F. Szidarovsky. Kluwer, Boston, MA, pp. 525–544.Google Scholar
Denardo, E. V. and Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stopping. SIAM J. Control Optimization 45, 414431.CrossRefGoogle Scholar
Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett. 40, 305321.Google Scholar
Di Masi, G. B. and Stettner, L. (2007). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J. Control Optimization 38, 6178.CrossRefGoogle Scholar
Di Masi, G. B. and Stettner, L. (2007). Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optimization 46, 231252.CrossRefGoogle Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Manag. Sci. 18, 356369.CrossRefGoogle Scholar
Jaśkiewicz, A. (1989). Average optimality for risk sensitive control with general state space. Ann. Appl. Prob. 17, 654675.Google Scholar
Kontoyiannis, I. and Meyn, S. P. (2013). Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Prob. 13, 304362.Google Scholar
Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia.CrossRefGoogle Scholar
Pitera, M. and Stettner, L. (2015). Long run risk sensitive portfolio with general factors. Math. Meth. Operat. Res. 82, 265293.Google Scholar
Puterman, M. (1994). Markov Decision Processes. Wiley, New York.CrossRefGoogle Scholar
Sladký, K. (2008). Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44, 205226.Google Scholar
Stettner, L. (1999). Risk sensitive portfolio optimization. Math. Meth. Operat. Res. 50, 463474.CrossRefGoogle Scholar
Zaleskiewicz, T. (2001). Beyond risk seeking and risk aversion: Personality and the dual nature of economic risk taking. Europ. J. Pers. 15, S105S122.CrossRefGoogle Scholar