Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization

Grechuk, Bogdan; Zabarankin, Michael; Mafusalov, Alexander; Uryasev, Stan

doi:10.1007/s11590-023-02045-1

Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization

Original Paper
Open access
Published: 09 October 2023

Volume 18, pages 403–426, (2024)
Cite this article

Download PDF

You have full access to this open access article

Optimization Letters Aims and scope Submit manuscript

Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization

Download PDF

Bogdan Grechuk ORCID: orcid.org/0000-0002-2624-5765¹,
Michael Zabarankin²,
Alexander Mafusalov³ &
…
Stan Uryasev⁴

508 Accesses
1 Citation
Explore all metrics

Abstract

For a random variable, superdistribution has emerged as a valuable probability concept. Similar to cumulative distribution function (CDF), it uniquely defines the random variable and can be evaluated with a simple one-dimensional minimization formula. This work leverages the structure of that formula to introduce buffered CDF (bCDF) and reduced CDF (rCDF) for random vectors. bCDF and rCDF are shown to be the minimal Schur-convex upper bound and the maximal Schur-concave lower bound of the multivariate CDF, respectively. Special structure of bCDF and rCDF is used to construct an algorithm for solving optimization problems with bCDF and rCDF in objective or constraints. The efficiency of the algorithm is demonstrated in a case study on optimization of a collateralized debt obligation with bCDF functions in constraints.

DICE: A New Family of Bivariate Estimation of Distribution Algorithms Based on Dichotomised Multivariate Gaussian Distributions

On the Partial-Geometric Distribution: Properties and Applications

Article 01 December 2021

The Distributions of the Mean of Random Vectors with Fixed Marginal Distribution

Article Open access 25 September 2023

1 Introduction

The cumulative distribution function (CDF) of a random variable (r.v.) is a fundamental notion in probability theory and plays a central role in stochastic optimization, risk management, statistics, reliability theory and various applications. For instance, engineered systems, e.g., electrical grids and gas/oil pipelines, must be designed to comply with various safety and reliability regulations formulated in terms of the probability of failure calculated with CDFs. Also, smart electrical grids should be robust to rare events including power station faults and electromagnetic pulses, whereas gas pipeline operations should meet the consumer demand while being resilient to unforeseen production disruptions. Design of engineered structures and industrial systems involve multiple uncertain characteristics that are difficult to combine into a single loss function. For example, each watertight compartment of a submarine should be designed to maximize its survivability in case of an accident. Suppose that a submarine has eight compartments. Chances of failure of all compartments because of high outside pressure can be described by an eight-dimensional CDF accounting for failures of individual compartments (which are not independent because of common-cause factors: similar compartment design, outside water pressure, etc.). Another example is to ensure an uninterrupted power supply from a system of redundant emergency generators in case of a natural disaster. All generators may be impacted by the same factors and failures of individual generators are not independent. Yet another example is to evaluate the probability that several escape roads will be blocked simultaneously because of common factors such as a hurricane. In all of these examples, the probability of an event should be assessed for a random vector rather than an r.v. From optimization perspective, one of the technical difficulties that engineers face when using a multivariate CDF is that for any fixed $\textbf{x}\in {{\mathbb {R}}}^n$, the CDF $F_\textbf{X}(\textbf{x})$ of a random vector $\textbf{X}$ is not a convex function of $\textbf{X}$. Moreover, CDFs based on observations (i.e., discrete distributions) are discontinuous piecewise constant functions of $\textbf{x}$. In this case, stochastic optimization problems result in mixed-integer optimization problems, which are notoriously hard to solve for a large number of scenarios. Also, the use of CDF in decision problems has two conceptual shortcomings. First one is that at any fixed point x, the CDF of an r.v. X does not “capture” the extent of X below x, even if a tail has a small weight, the average value of X in the tail could be quite significant. As a result, reliance just on CDF may yield solutions with undesirable tail outcomes occurring with small probabilities. The other conceptual shortcoming is that in general, a safety first principle based on the CDF of X does not agree with a common perception that aggregation reduces volatility.^{Footnote 1} This perception can be “captured” by the notion of convex order. Informally, an r.v. $ X $ dominates an r.v. $ Y $ in convex order, and we write $X \geq _{cx} Y$, if the distribution of $ Y $ can be obtained from the distribution of $ X $ by aggregation. A formal definition is that $X \geq _{cx} Y$ if ${{\mathbb {E}}}[g(X)] \geq {{\mathbb {E}}}[g(Y)]$ for any convex function $g:{\mathbb {R}}\rightarrow {\mathbb {R}}$. $X \geq _{cx} Y$ implies that ${{\mathbb {E}}}[X]={{\mathbb {E}}}[Y]$ and $\sigma (X) \geq \sigma (Y)$, where $\sigma (X)$ is standard deviation,—this supports the perception that Y, being aggregated from X and having same expected value as X, is less volatile than X. A functional f is called Schur-convex if $X \geqslant _{cx} Y$ implies that $f(X) \geqslant f(Y)$. In other words, a Schur-convex functional does not increase after data aggregation. Examples of such functionals include $f(X)=\sigma (X)$ and $f(X)=-{{\mathbb {E}}}[u(X)]$ for a concave function u. However, $f(X)=F_X(x)={\mathbb {P}}[X\leqslant x]$ with fixed x is not Schur-convex. As a result, if $X^*\in \mathop {\mathrm{arg\,min}}_{X\in {{{\mathcal {X}}}}} F_X(x)$ for a feasible set ${{{\mathcal {X}}}}$ and $x \in {\mathbb {R}}$, there can exist $Y\in {{{\mathcal {X}}}}$ such that $X^* \geqslant _{cx} Y$ and $F_{X^*}(x)< F_Y(x)$. In this case, ${{\mathbb {E}}}[Y]={{\mathbb {E}}}[X^*]$ and $\sigma (Y) \leqslant \sigma (X^*)$, but Y is “less safe” than $X^*$.

In the one dimensional case, i.e., for an r.v. X, all these CDF deficiencies were addressed with the functions of superquantile and buffered probability of exceedance (bPOE) [6]. Superquantile, which is also known as expected shortfall and conditional value-at-risk (CVaR) [14, 15] of X with confidence level $\alpha$, is the expectation of the right $(1-\alpha )$—tail of the distribution of X, i.e., it is the average of the largest outcomes with total probability $1-\alpha$. Superquantile can be defined as follows

$$\begin{aligned} {\overline{q}}_X(\alpha )=\frac{1}{1-\alpha }\int _\alpha ^1 q_X(s)ds\equiv \min _{c\in {\mathbb {R}}}\left( c+\frac{1}{1-\alpha } {\mathbb {E}}[X-c]^+\right) ,\qquad \alpha \in [0,1), \end{aligned}$$

where $q_X(s) = \inf \{x \in {\mathbb {R}}\,|\,F_X(x)>s\}$ is the quantile function and $z^+$ denotes $\max \{0,z\}$. bPOE is an extension of the buffered probability of failure [13] and is equal to 1 minus the inverse of superquantile, see [6],

$$\begin{aligned} {\overline{p}}_X(x) = 1-{\overline{q}}^{\;-1}_X (x)\;\; \text {for} \;\; x \ne \sup X. \end{aligned}$$

It can also be evaluated by

$$\begin{aligned} {\overline{p}}_X(x) = \min _{a\geqslant 0} {\mathbb {E}}[a(X-x) + 1]^+ \;\; \text {for} \;\; x \ne \sup X, \end{aligned}$$

(1)

see [6, 7]. Considered as a function of r.v. X for fixed x, bPOE is the minimal quasi-concave upper bound (see [6]) of probability of exceedance (POE),

$$\begin{aligned} p_X(x)={\mathbb {P}}[ X \geqslant x] \;\;\text {for} \;\;x \ne \sup X. \end{aligned}$$

Superquantile and bPOE constraints are equivalent, see [6]:

$$\begin{aligned} {\overline{p}}_X(x) \leqslant 1-\alpha \quad \Longleftrightarrow \quad {\overline{q}}_{X}(\alpha ) \leqslant x \;\;\text {for} \;\; x \ne \sup X, \;\; \alpha \in [0,1). \end{aligned}$$

(2)

The aim of this work is to construct extensions of multivariate CDF, defined for the lower tail rather than the upper tail. The analogue of superquantile for the lower tail will be called subquantile:

$$\begin{aligned} {\underline{q}}_X(\alpha )=\frac{1}{\alpha }\int _0^\alpha q_X(s)ds\equiv \max _{c\in {\mathbb {R}}}\left( c-\frac{1}{\alpha }{\mathbb {E}}[c-X]^+\right) ,\quad \alpha \in (0,1]. \end{aligned}$$

(3)

To define an analog of bPOE for the lower tail, let an r.v. X with the CDF $F_X:{\mathbb {R}}\rightarrow [0,1]$ represent a reward of some kind, e.g., money, energy, satisfied demand, etc. Then for a threshold $\alpha \in (0,1]$, buffered CDF (bCDF) is defined as an inverse of subquantile, i.e., ${\overline{F}}_X(x)=\alpha$, where $\alpha$ is a solution of the equation $x={\underline{q}}_X(\alpha )$. In other words,

$$\begin{aligned} {\overline{F}}_X(x)=\alpha \quad \Longleftrightarrow \quad x={\underline{q}}_X(\alpha )\;\;\text {for} \;\; x \ne \inf X, \;\; \alpha \in (0,1]. \end{aligned}$$

(4)

For every fixed x, bCDF ${\overline{F}}_X(x)$ is a quasi-convex function of r.v. X and is the minimal quasi-convex upper bound of $F_X(x)$. This follows from the fact that bPOE is the minimal quasi-convex upper bound of POE [6]; see also [18, Eq’s (3.2.6)–(3.2.8)] and [2, Eq (30)]. bCDF admits a representation similar to (1):

$$\begin{aligned} {\overline{F}}_X (x) = {\overline{p}}_{-X}(-x)=\min _{a\geqslant 0} {\mathbb {E}}[a(x-X) + 1]^+. \end{aligned}$$

(5)

Similar to the constraint equivalence (2) for the upper tail, (4) yields constraint equivalence for the lower tail:

$$\begin{aligned} {\overline{F}}_X(x)\leqslant \alpha \quad \Longleftrightarrow \quad {\underline{q}}_X(\alpha )\geqslant x\quad \Longleftrightarrow \quad \max _{c\in {\mathbb {R}}}\left( c-\frac{1}{\alpha }{\mathbb {E}}[c-X]^+\right) \geqslant x\;\;\text {for} \;\; x \ne \inf X, \;\; \alpha \in (0,1]. \end{aligned}$$

Under certain conditions, the maximum in the last constraint can be dropped, and the constraint takes the form $c-\alpha ^{-1}{\mathbb {E}}[c-X]^+\geqslant x$, which can be reformulated as linear constraints for discrete distributions. However, the main advantage of bCDF compared to CDF is that bCDF describes the distribution tails. For instance, when r.v. $X$ represents a portfolio value and depends on the portfolio weights (decision variables), minimization of ${\overline{F}}_X(x)$ yields an optimal solution $X^*$ with ${\overline{F}}_{X^*}(x)=\alpha ^*$ such that the average value of $X^*$ in $\alpha ^*\cdot 100\%$ of the worst outcomes is at least x. Under some mild conditions, optimization of one-dimensional bCDF can be reduced to convex and linear programming (for discrete distributions). For example, let ${{\textbf{X}}}= (X_1,\dots ,X_m)$ be a known random vector, $x\in {\mathbb {R}}$ a given threshold, ${{\textbf{w}}}=( w_1,\dots ,w_m) \in {\mathbb {R}}^m$ an unknown vector, and $X={{\textbf{w}}}^{\! \top } {{\textbf{X}}}= w_1 X_1 +\cdots + w_m X_m$. Then, the problem of minimizing ${\overline{F}}_X(x)$ with respect to ${{\textbf{w}}}$ can be formulated with (5) as

$$\begin{aligned} \min _{{{\textbf{w}}}} {\overline{F}}_X(x) = \min _{{{\textbf{w}}}} \min _{a\geqslant 0}{\mathbb {E}}[a(x - {{\textbf{w}}}^{\! \top } {{\textbf{X}}})+1]^+=\min _{{\textbf{c}}, \,a\geqslant 0}{\mathbb {E}}[a\,x - {\textbf{c}} ^{\! \top } {{\textbf{X}}}+1]^+, \end{aligned}$$

where ${\textbf{c}} =a\,{{\textbf{w}}}$ is a new decision vector. The last optimization problem can be reduced to convex and linear programming for a discrete distribution as in [6]. This also follows from a simple observation that

$$\begin{aligned} {\overline{F}}_X(x) = {\overline{p}}_{x - {{\textbf{w}}}^{\! \top } {{\textbf{X}}}}(0). \end{aligned}$$

(6)

Rockafellar and Royset [12] introduced super CDF (sCDF) as the inverse of superquantile, i.e., ${\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})={\overline{q}}^{\,-1}_X (x) = 1- {\overline{p}}_X(x)$, which can be shown to be the maximal quasi-concave lower bound of CDF. It follows from (1) that

$$\begin{aligned} {\underline{F}}_{X}(x) = 1- {\overline{p}}_X(x) =1-\min _{a\geqslant 0} {\mathbb {E}}[a(X-x) + 1]^+ = \max _{a \geqslant 0} {\mathbb {E}}[\min \{a(x-X),1\}]. \end{aligned}$$

(7)

We will call super CDF by reduced CDF (rCDF) to emphasize that this function is a lower bound of CDF, versus buffered CDF which is an upper bound of CDF.

It follows from (5) and (7) that bCDF and rCDF are related by

$$\begin{aligned} {\underline{F}}_{X}(x) = 1-{\overline{F}}_{-X}(-x),\qquad {\overline{F}}_{X}(x) = 1-{\underline{F}}_{-X}(-x). \end{aligned}$$

(8)

Generalizing bCDF and rCDF for random vectors is not straightforward. The difficulty is that in the multivariate case, the notions of quantile and its inverse are not well-defined. A potential insight can be offered through exploring the properties of (5) and (7). This work generalizes bCDF and rCDF given by (5) and (7) to random vectors. In contrast to the one-dimensional case, the multivariate bCDF and rCDF are not quasi-convex and quasi-concave functions of a random vector. We show that they are the minimal Schur-convex upper bound and the maximal Schur-concave lower bound of the multivariate CDF, respectively, and that their special structure allows us constructing an algorithm, which can quickly find their local extrema. We demonstrate efficiency of the algorithm in a case study on optimization of a collateralized debt obligation with bCDF functions in constraints.

The paper is organized into five sections and five appendices. Section 2 introduces bCDF and rCDF and shows that they are upper and lower bounds for a multivariate CDF. Section 3 considers bCDF and rCDF in optimization problems. Section 4 presents applications of bCDF and rCDF. Section 5 discusses the case study. Appendices A, B and C present proofs of three propositions and Appendices D and E summarize algorithms for solving optimization problems with bCDF and rCDF in objective function and constraints.

2 The lower and upper bounds for multivariate CDF

Let $(\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})$ be a probability space, where $\Omega$ is an arbitrary non-empty set, ${{{\mathcal {F}}}}$ is the $\sigma$-algebra of subsets of $\Omega$, and ${\mathbb {P}}$ is a probability measure on $(\Omega ,{{{\mathcal {F}}}})$. Sets in ${{{\mathcal {F}}}}$ are called events. A probability space $(\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})$ is called atomless if there exists an r.v. on it with a continuous CDF.

An n-dimensional random vector ${{\textbf{X}}}=(X_1, \dots , X_n)$ is a function ${{\textbf{X}}}:\Omega \rightarrow {\mathbb {R}}^n$ such that for every ${{\textbf{x}}}\in {\mathbb {R}}^n$ set $\{\omega \in \Omega \,|\,{{\textbf{X}}}(\omega ) \leqslant {{\textbf{x}}}\}$ is an event,^{Footnote 2} and $X_1, \dots , X_n$ are called r.v.’s. On an atomless probability space, there exists a collection of random vectors with any given joint distribution. Let $L^{1,n}(\Omega )$ and $L^{\infty ,n}(\Omega )$ denote the sets of n-dimensional random vectors ${{\textbf{X}}}$ for which ${{\mathbb {E}}}[{{\textbf{X}}}]=\int _\Omega {{\textbf{X}}}(\omega )d{\mathbb {P}}$ and $\max\limits_i \sup |{{X_i}}|$ exist and are finite, respectively.

Definition 1

A random vector ${{\textbf{X}}}\in L^{1,n}(\Omega )$ dominates random vector ${{\textbf{Y}}}\in L^{1,n}(\Omega )$ in convex order, and we write ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$, if ${{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[g({{\textbf{Y}}})]$ for any convex function $g:{\mathbb {R}}^n \rightarrow {\mathbb {R}}$.

The convexity of g implies existence of ${{\textbf{a}}}\in {\mathbb {R}}^n$ and $b \in {\mathbb {R}}$ such that $g(x) \geqslant {{\textbf{a}}}^\top x + b$ for all ${{\textbf{x}}}\in {\mathbb {R}}^n$, so that ${{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ b] = {{\textbf{a}}}^\top {{\mathbb {E}}}[{{\textbf{X}}}] + b > -\infty$ is well-defined for every ${{\textbf{X}}}\in L^{1,n}(\Omega )$, although it can be $+\infty$.

Definition 2

A function $f:L^{1,n}(\Omega ) \rightarrow {{\mathbb {R}}}$ is Schur-convex if ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$ implies $f({{\textbf{X}}}) \geqslant f({{\textbf{Y}}})$. A function f is Schur-concave if $-f$ is Schur-convex.

Definition 2 implies that any function in the form $f({{\textbf{X}}})={{\mathbb {E}}}[g({{\textbf{X}}})]$ is Schur-convex and Schur-concave if g is convex and concave, respectively. Any function f has the unique minimal Schur-convex upper bound given by

$$\begin{aligned} f_u(\textbf{X}) = \sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}} f({{\textbf{Y}}}), \end{aligned}$$

see [6], as well as the the unique maximal Schur-concave lower bound

$$\begin{aligned} f_l(\textbf{X}) = \inf _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}} f({{\textbf{Y}}}). \end{aligned}$$

Let

$$\begin{aligned} F_{{\textbf{X}}}({{\textbf{x}}})={\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] \end{aligned}$$

denote multivariate CDF of ${{\textbf{X}}}$.

Definition 3

For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$, buffered CDF (bCDF) and reduced CDF (rCDF) are defined by

$$\begin{aligned}{} & {} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}\left[ {{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1\right] ^+, \end{aligned}$$

(9)

$$\begin{aligned}{} & {} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}], \end{aligned}$$

(10)

respectively, where ${{\textbf{a}}}=(a_1, \dots , a_n)$, ${{\textbf{a}}}^{\! \top } {{\textbf{X}}}=\sum _{i=1}^n a_i X_i$, and ${\mathbb {R}}^n_+=\{{{\textbf{a}}}\in {\mathbb {R}}^n, \, {{\textbf{a}}}\geqslant 0\}$.

Remark

With an auxiliary variable $a_0 \geqslant 0$ and the following set (simplex)

$$\begin{aligned} {{\textbf{A}}}=\left\{ {{\textbf{a}}}\in {\mathbb {R}}^n_+ \,\left| \, \sum _{i=1}^n a_i=n \right. \right\} , \end{aligned}$$

(11)

bCDF and rCDF, defined by (9) and (10), respectively, can be recast as

$$\begin{aligned}{} & {} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{{{\textbf{a}}}\in {{\textbf{A}}},\; a_0 \in {\mathbb {R}}_+} {\mathbb {E}}\left[ a_0 \,{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1\right] ^+ = \inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\overline{p}}_{{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}})}(0), \end{aligned}$$

(12)

$$\begin{aligned}{} & {} \begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})&= 1-\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}\left[ 1+\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}\right] ^+\\&= 1-\inf _{a_0 \geqslant 0} \inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\mathbb {E}}\left[ 1+a_0\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}\right] ^+ \\&= 1-\inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\overline{p}}_{\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}}(0), \end{aligned} \end{aligned}$$

(13)

provided that infimums in (12) and (13) with respect to $a_0\geqslant 0$ are attained. Representations (12) and (13) show that bCDF and rCDF of a random vector ${{\textbf{X}}}$ are, in fact, bPOE of corresponding r.v.’s and minimized with respect to the vector ${{\textbf{a}}}$ over the simplex ${{\textbf{A}}}$. These representations are important in devising efficient algorithms for solving optimization problems with bCDF and rCDF. If ${{\textbf{X}}}$ depends on decision variables, optimal decision problems involving bCDF and rCDF in either objective function or constraints are bilevel optimization, see Sect. 3.

First we show that bCDF and rCDF are the upper and lower bounds for CDF, respectively.

Proposition 2.1

For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$ and any ${{\textbf{x}}}\in {\mathbb {R}}^n$,

$$\begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) \leqslant {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}] \leqslant {\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] = F_{{{\textbf{X}}}}({{\textbf{x}}}) \leqslant {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}). \end{aligned}$$

(14)

Proof

Let ${{\textbf{a}}}\in {\mathbb {R}}^n_+$ be arbitrary, and let

$$\begin{aligned} A = \{w \in \Omega : \, {{\textbf{X}}}(\omega )< {{\textbf{x}}}\} \end{aligned}$$

be the event that ${{\textbf{X}}}(\omega )< {{\textbf{x}}}$. Then for any $\omega \not \in A$ we have $X_i(\omega )\geqslant x_i$ for some i, so that

$$\begin{aligned} \min \{1, a_1 (x_1-X_1(\omega )),\dots , a_n (x_n-X_n(\omega ))\} \leqslant a_i(x_i-X_i(\omega )) \leqslant 0 = I_A(\omega ), \end{aligned}$$

where $I_A$ is the indicator indicator function^{Footnote 3} of the event A. Also, for $\omega \in A$,

$$\begin{aligned} \min \{1, a_1 (x_1-X_1(\omega )),\dots , a_n (x_n-X_n(\omega ))\} \leqslant 1 = I_A(\omega ), \end{aligned}$$

and consequently,

$$\begin{aligned} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}] \leqslant {\mathbb {E}}[I_A] = {\mathbb {P}}[A] = {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]. \end{aligned}$$

Since $a \in {\mathbb {R}}^n_+$ is arbitrary, the last inequality and (10) yield the first inequality in (14). Similarly, if

$$\begin{aligned} B = \{w \in \Omega : \, {{\textbf{X}}}(\omega )\leqslant {{\textbf{x}}}\} \end{aligned}$$

is the event that ${{\textbf{X}}}(\omega )\leqslant {{\textbf{x}}}$, then for any $\omega \in B$,

$$\begin{aligned}{}[{{\textbf{a}}}^\top ({{\textbf{x}}}- {{\textbf{X}}}(\omega )) + 1]^+ \geqslant [0+1]^+ = 1 = I_B(\omega ), \end{aligned}$$

and for any $\omega \not \in B$,

$$\begin{aligned}{}[{{\textbf{a}}}^\top ({{\textbf{x}}}- {{\textbf{X}}}(\omega )) + 1]^+ \geqslant 0 = I_B(\omega ), \end{aligned}$$

so that

$$\begin{aligned} {\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1]^+ \geqslant E[I_B] = {\mathbb {P}}[B] = {\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] = F_{{{\textbf{X}}}}({{\textbf{x}}}), \end{aligned}$$

which along with (9) yields the last inequality in (14). $\square$

It will be shown later that bCDF and rCDF are the minimal Schur-convex upper bounds of $F_{{{\textbf{X}}}}({{\textbf{x}}})$ and the maximal Schur-concave lower bounds of ${\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]$, respectively.

For an r.v. X,

$$\begin{aligned} {\overline{F}}_{X}(x)={\overline{p}}_{-X}(-x) \quad \text {and} \quad {\underline{F}}_{X}(x) = 1-{\overline{p}}_{X}(x), \end{aligned}$$

where ${\overline{p}}_X(x)$ is the (one-dimensional) bPOE defined in (1).

Also, note that

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})&= 1-\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}] \end{aligned}$$

(15a)

$$\begin{aligned}&= \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\max \{a_1 (X_1-x_1),\dots , a_n (X_n-x_n)\} + 1]^+ \end{aligned}$$

(15b)

$$\begin{aligned}&\leqslant \inf _{a_0 \geqslant 0} {\mathbb {E}}[a_0 \max \{X_1 -x_1, \dots , X_n-x_n\} + 1]^+ \end{aligned}$$

(15c)

$$\begin{aligned}&= {\overline{p}}_{\max \{X_1-x_1, \dots , X_n-x_n\}}(0). \end{aligned}$$

(15d)

Note that $F_{{\textbf{X}}}(\textbf{x})$ can be represented by

$$\begin{aligned} F_{{\textbf{X}}}(\textbf{x})={\mathbb {P}}[{{\textbf{X}}}\leqslant \textbf{x}] = \sup _{W \in \{0,1\}} {\mathbb {E}}[W]\quad \text {subject to}\quad X_i W \leqslant x_i W,\quad i=1,\dots ,n, \end{aligned}$$

(16)

where “$W \in \{0,1\}$” indicates that the supremum is over all random variables such that $W(\omega )\in \{0,1\}$ for every $\omega \in \Omega$, while the constraint $X_i W \leqslant x_i W$ means that $X_i(\omega ) W(\omega ) \leqslant x_i W(\omega )$ for all $\omega \in \Omega$. The supremum in (16) is attained when W is the indicator function of the event ${{\textbf{X}}}\leqslant \textbf{x}$. Similarly, bCDF has the following interpretation through dual characterization.

Proposition 2.2

For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$ and any ${{\textbf{x}}}\in {\mathbb {R}}^n$,

$$\begin{aligned} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) = \sup _{W \in L^{1}(\Omega )} \, {\mathbb {E}}[W]\quad \text {subject to}\quad {\mathbb {E}}[X_i W] \leqslant x_i\, {\mathbb {E}}[W], \quad i=1,\dots ,n, \quad W\in [0,1], \end{aligned}$$

where $W\in [0,1]$ means that $0 \leqslant W(\omega ) \leqslant 1$ for all $\omega \in \Omega$.

Proof

See Appendix A. $\square$

Proposition 2.3

If the underlying probability space $(\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})$ is atomless, then ${\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})$ is the minimal Schur-convex upper bound for $F_{{{\textbf{X}}}}({{\textbf{x}}})$.

Proof

See Appendix B. $\square$

When $n=1$, ${\overline{F}}_X(x)$ is also a unique minimal quasi-convex upper bound for $F_X(x)$, see [6]. This is not true when $n\geqslant 2$.

Proposition 2.4

In general (when $n\geqslant 2$), ${\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})$ is not quasi-convex in ${{\textbf{X}}}$.

Proof

Let $n\geqslant 2$, ${{\textbf{0}}}=(0,0,\dots ,0)$, ${{\textbf{x}}}_0=(1,-1,0,\dots ,0)\in {\mathbb {R}}^n$ and ${{\textbf{X}}}$ be such that ${\mathbb {P}}[{{\textbf{X}}}={{\textbf{x}}}_0]=1$. Then ${\overline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})=\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{0}}}-{{\textbf{X}}}) + 1]^+ = \min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} [-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+ = 0$, where the last equality follows from $[-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+\geqslant 0$ and that $[-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0$ for ${{\textbf{a}}}=(1,0,\dots ,0)$. Similarly, for ${{\textbf{a}}}=(0,1,0\dots ,0)$, it is valid $[{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0$, which yields ${\overline{F}}_{-{{\textbf{X}}}}({{\textbf{0}}})=0$. However, ${\overline{F}}_{({{\textbf{X}}}+(-{{\textbf{X}}}))/2}(0)={\overline{F}}_{{{\textbf{0}}}}({{\textbf{0}}})=1>0$. Consequently, ${\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})$ is not quasi-convex. $\square$

Remark

The proof of Proposition 2.4 uses constant random vectors. Suppose that $n=2$ and that ${{\textbf{X}}}$ is a constant r.v., i.e., ${\mathbb {P}}[{{\textbf{X}}}=(x_1,x_2)]=1$. Let

$$\begin{aligned} g(x_1,x_2) = {\overline{F}}_{{{\textbf{X}}}}({{\textbf{0}}}) = \inf _{(a_1,a_2) \in {\mathbb {R}}^2_+} {\mathbb {E}}[(a_1,a_2)^\top (-{{\textbf{X}}}) + 1]^+ = \inf _{(a_1,a_2) \in {\mathbb {R}}^2_+} [a_1(-x_1)+a_2(-x_2) + 1]^+. \end{aligned}$$

If $x_1 \leqslant 0$ and $x_2 \leqslant 0$, then $a_1(-x_1)+a_2(-x_2) + 1 \geqslant 1$, so that $g(x_1,x_2)=1$. If $x_1>0$ then for $a_1=1/x_1$ and $a_2=0$, we have $a_1(-x_1)+a_2(-x_2) + 1 = 0$, and consequently, $g(x_1,x_2)=0$. Similarly, if $x_2>0$ then $g(x_1,x_2)=0$. Thus, $g(x_1,x_2)=1$ when $x_1\leqslant 0$ and $x_2 \leqslant 0$, and $g(x_1,x_2)=0$ otherwise. This function is not quasi-convex and cannot be made quasi-convex by altering it values at a finite number of points.

Now let $I=\{\left. (a'_1, a'_2)\right| a'_1 \geqslant 0, \, a'_2 \geqslant 0, \, a'_1 + a'_2 = 1\}$ be the interval with endpoints (1, 0) and (0, 1). Then $g(x_1,x_2)$ can be represented as

$$\begin{aligned} g(x_1,x_2) = \inf _{(a'_1,a'_2) \in I} \inf _{a_0 \geqslant 0} [a_0(-a'_1x_1-a'_2x_2) + 1]^+ = \inf _{(a'_1,a'_2) \in I} h_{a'_1,a'_2}(x_1,x_2), \end{aligned}$$

where

$$\begin{aligned} h_{a'_1,a'_2}(x_1,x_2) = \inf _{a_0 \geqslant 0} [a_0(-a'_1x_1-a'_2x_2) + 1]^+. \end{aligned}$$

Observe that $h_{a'_1,a'_2}(x_1,x_2)=1$ when $a'_1x_1 + a'_2x_2 \leqslant 0$ and $h_{a'_1,a'_2}(x_1,x_2)=0$ otherwise, and that $h_{a'_1,a'_2}(x_1,x_2)$ is quasi-convex. However, the infimum of such functions over $(a'_1,a'_2) \in I$ is not quasi-convex. As $(a'_1,a'_2)$ moves along I from (1, 0) to (0, 1), the line $a'_1x_1 + a'_2x_2$ rotates from the x-axis to the y-axis. As a result, $g(x_1,x_2) = 1$ if and only if the point $(x_1,x_2)$ remains “under” the rotating line, which occurs if and only if $x_1 \leqslant 0$ and $x_2 \leqslant 0$.

Note that $g(x_1,x_2)$ is the infimum of quasi-convex functions $h_{a'_1,a'_2}(x_1,x_2)$, but $g(x_1,x_2)$ is not “closely approximated” by any of these functions. Also, every function $h_{a'_1,a'_2}(x_1,x_2)$ is a quasi-convex upper bound for $g(x_1,x_2)$. However, none of these quasi-convex upper bounds is “best” or “minimal”. So, $g(x_1,x_2)$ has many quasi-convex upper bounds, but the unique “minimal” quasi-convex upper bound does not exist.

In fact, ${\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})$ is also not quasi-concave in ${{\textbf{X}}}$, even for $n=2$, see [9]. It can be interpreted through dual characterization in a way similar to that of bCDF. Indeed,

$$\begin{aligned} \begin{aligned} 1 - F_{{\textbf{X}}}(\textbf{x})&= {\mathbb {P}}[X_1> x_1 \text { or } \ldots \text { or } X_n> x_n]\\&=\sup _{W_i \in \{0,1\}} \sum _{i=1}^n {\mathbb {E}}[W_i]\quad \text {subject to}\quad X_i W_i > x_i W_i,\quad i=1,\dots ,n,\quad \sum _{i=1}^n W_i \leqslant 1. \end{aligned} \end{aligned}$$

(17)

Proposition 2.5

Define

$$\begin{aligned} \begin{aligned} {\underline{F}}^D_{{{\textbf{X}}}}({{\textbf{x}}}) = \inf _{{{\textbf{W}}}\in L^{\infty ,n}(\Omega )} \,&\left( 1-\sum _{i=1}^n {\mathbb {E}}[W_i]\right) \\ \text {subject to}&\quad {\mathbb {E}}[X_i W_i] \geqslant x_i\,{\mathbb {E}}[W_i],\;\; W_i\in [0,1],\quad i=1,\dots ,n,\;\; \sum _{i=1}^n W_i\in [0,1]. \end{aligned} \end{aligned}$$

Also, let

$$\begin{aligned} {\underline{F}}^L_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{Y \leqslant _{cx} X} {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}] \end{aligned}$$

be the the maximal Schur-concave lower bound for ${\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]$. Then on an atomless probability space,

$$\begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) = {\underline{F}}^D_{{{\textbf{X}}}}({{\textbf{x}}}) = {\underline{F}}^L_{{{\textbf{X}}}}({{\textbf{x}}}). \end{aligned}$$

Proof

See Appendix C. $\square$

For illustration of these results, let r.v.’s $X_1$ and $X_2$ model profits from two different investments and let an investor evaluate the probability of both $X_1$ and $X_2$ to be non-positive. Let ${{\textbf{X}}}=(X_1,X_2)$, ${{\textbf{0}}}=(0,0)$, and let $F_{{{\textbf{X}}}}({{\textbf{0}}})={\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{0}}}]$. Let ${{\textbf{X}}}$ assume values $(-1,-1)$, (1, 1) with probabilities 1/3 and 2/3, respectively. Then $F_{{{\textbf{X}}}}({{\textbf{0}}})=1/3$. Suppose X takes values $(-1,-1)$, (1, 1), and (1, 1) in the events $\omega _1$, $\omega _2$ and $\omega _3$ with probabilities ${\mathbb {P}}[\omega _1]={\mathbb {P}}[\omega _2]={\mathbb {P}}[\omega _3]=1/3$. Let ${{{\mathcal {F}}}}=\{A,B\}$ be a partition of the probability space by the events $A=\{\omega _1,\omega _2\}$ and $B=\{\omega _3\}$. Then ${{\textbf{Y}}}={\mathbb {E}}[{{\textbf{X}}}|{{{\mathcal {F}}}}]$ is a random vector equal to (0, 0) and (1, 1) with probabilities 2/3 and 1/3, respectively. Note that Y is “averaged” version of X. However, we have $F_{{{\textbf{Y}}}}({{\textbf{0}}})=2/3 > 1/3 = F_{{{\textbf{Y}}}}({{\textbf{0}}})$. This means that $F_{{{\textbf{X}}}}({{\textbf{0}}})$ is not Schur-convex as a function of ${{\textbf{X}}}$. Instead, we suggest to use its Schur-convex upper bound given in Proposition 2.5—such bound decreases when ${{\textbf{X}}}$ is replaced by its average-out version.

This example also demonstrates convenience of modeling the probability space as atomless even for discrete random variables. If the original random vector ${{\textbf{X}}}$ were defined on a discrete probability space $\Omega =\{\omega _1,\omega _2\}$ with ${\mathbb {P}}[\omega _1]=1/3$ and ${\mathbb {P}}[\omega _2]=2/3$, the “averaged-out” version ${{\textbf{Y}}}$ of ${{\textbf{X}}}$ would not exist. For this $\Omega$,

$$\begin{aligned} \sup _{Y \leqslant _{cx} X} F_{{{\textbf{Y}}}}({{\textbf{0}}}) = \frac{1}{3} \ne \frac{2}{3}, \end{aligned}$$

which illustrates that Proposition 2.3 fails on discrete probability spaces.

3 Buffered and reduced CDFs in optimization problems

Let ${{\textbf{X}}}$ be a random vector, ${{\textbf{x}}}\in {\mathbb {R}}^n$, and let $\textbf{f}({{\textbf{w}}};{{\textbf{X}}})= (f_1({{\textbf{w}}};{{\textbf{X}}}),\ldots , f_n({{\textbf{w}}};{{\textbf{X}}}))$ be a vector of concave functions of ${{\textbf{w}}}$ and ${\textbf{h}}({{\textbf{w}}};{{\textbf{X}}})= (h_1({{\textbf{w}}};{{\textbf{X}}}),\ldots ,h_n({{\textbf{w}}};{{\textbf{X}}}))$ be a vector of convex functions of ${{\textbf{w}}}$, defined on a convex set ${{\textbf{W}}}\subseteq {\mathbb {R}}^k$. Let also ${\mathbb {E}}[|f_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty$, ${\mathbb {E}}[|h_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty$ for all ${{\textbf{w}}}\in {{\textbf{W}}}$ and $i\in \{1,\dots ,n\}$.

If all $f_i$ in the definition of ${\textbf{f}}({{\textbf{w}}};{{\textbf{X}}})$ are convex, and ${{{\mathcal {W}}}}= {{{\mathcal {W}}}}_1 \times \cdots \times {{{\mathcal {W}}}}_n$ with all ${{{\mathcal {W}}}}_1,\dots ,{{{\mathcal {W}}}}_n$ being convex, then the problem $\min _\mathbf{w \in {{{\mathcal {W}}}}} {\overline{F}}_\mathbf{f(w;X)}(\textbf{x})$ is convex.

3.1 bCDF and rCDFs in objective

This section considers the following two optimization problems

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \end{aligned}$$

(18)

and

$$\begin{aligned} \max _{{{\textbf{w}}}\in {{\textbf{W}}}}{\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}). \end{aligned}$$

(19)

With functions $G_{\! F}({{\textbf{a}}}, {{\textbf{w}}})$ and $G_{\! H}({{\textbf{a}}},{{\textbf{w}}})$ defined by

$$\begin{aligned} G_{\! F}({{\textbf{a}}}, {{\textbf{w}}}) ={\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) + 1]^+ \end{aligned}$$

and

$$\begin{aligned} G_{\! H}({{\textbf{a}}},{{\textbf{w}}})= -{\mathbb {E}}[\min \{1, a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}], \end{aligned}$$

bCDF and rCDF can be recast as

$$\begin{aligned} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} G_{\! F}({{\textbf{a}}},{{\textbf{w}}}) \end{aligned}$$

(20)

and

$$\begin{aligned} {\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} - G_{\! H}({{\textbf{a}}},{{\textbf{w}}}). \end{aligned}$$

(21)

It is assumed that in (20) and (21), the infimum and supremum with respect to ${{\textbf{a}}}$ can be replaced by the minimum and maximum, respectively.^{Footnote 4} With this assumption, optimization problems (18) and (19) along with (20) and (21) take the form

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! F}({{\textbf{a}}},{{\textbf{w}}}) \end{aligned}$$

(22)

and

$$\begin{aligned} \max _{{{\textbf{w}}}\in {{\textbf{W}}}}{\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}})=\max _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} -G_{\! H}({{\textbf{a}}},{{\textbf{w}}})=-\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! H}({{\textbf{a}}},{{\textbf{w}}}), \end{aligned}$$

(23)

respectively, and (22) and (23) can be written as

$$\begin{aligned} \min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}({{\textbf{a}}},{{\textbf{w}}}), \quad I\in \{F,H \}. \end{aligned}$$

(24)

Further, with an auxiliary variable $a_0 \geqslant 0$ and with the set ${{\textbf{A}}}$ defined by (11), problem (24) can be equivalently reformulated in the form

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}},\; a_0 \in {\mathbb {R}}_+,\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}. \end{aligned}$$

(25)

The idea is to solve (25) iteratively by alternating variables $(a_0, {{\textbf{w}}})$ and ${{\textbf{a}}}$: solve (25) with respect to $(a_0, {{\textbf{w}}})$, then with respect to ${{\textbf{a}}}$, and again with respect to $(a_0, {{\textbf{w}}})$, and then again with respect to ${{\textbf{a}}}$, and so on. This method is known as coordinate descent, or, more generally, Block Coordinate Descent (BCD), in which variables are divided into m blocks, and optimization is done iteratively for one block at a time. BCD may not converge to stationary points of a non-convex objective function, which is convex in each block coordinate [10]. However, global convergence was studied under additional assumptions: two-block ($m=2$) or strict quasiconvexity for $m-2$ blocks [3, 4] and uniqueness of minimizer per block [1, Sect. 2.7]. See also [5] for convergence of BCD for nonconvex problems and for relevant references.

Note that in our case the optimization problems

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}, \end{aligned}$$

are convex with respect to ${{\textbf{a}}}$, since the objective is the maximum of linear functions with respect to components of the vector ${{\textbf{a}}}$. On the other hand, the optimization problems

$$\begin{aligned} \min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}) = \min _{ {{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}, \end{aligned}$$

can be reduced to convex programming in ${{\textbf{w}}}$, since they are bPOE minimization problems. Indeed,

$$\begin{aligned} \begin{aligned} \min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! F}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}}) = \min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} G_{\! F}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}}) =&\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[a_0\, {{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) + 1]^+ \\ =&\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \,{\overline{p}}_{{{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) } (0). \end{aligned} \end{aligned}$$

(26)

Since all components of the vector function ${\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})$ are concave in ${{\textbf{w}}}$, the function ${{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}}))$ is convex in ${{\textbf{w}}}$. Consequently, minimization of bPOE with respect to ${{\textbf{w}}}$ for this function can be reduced to convex programming, see [6]. Also, (15a) and (15b) imply that

$$\begin{aligned} \begin{aligned} &\min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! H}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}})\\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} -{\mathbb {E}}[\min \{1, a_0 a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_0 a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}]\\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \left( - \max _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[\min \{1, a_0a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_0 a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}]\right) \\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \left( \min _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[\max \{ a_0 a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_0 a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}+1]^+-1\right) \\ &=-1+ \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{p}}_{\max \{ a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}}(0). \end{aligned} \end{aligned}$$

(27)

Since all components of the vector function ${\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})$ are convex in ${{\textbf{w}}}$, and components of the vector ${{\textbf{a}}}$ are nonnegative, the function

$$\begin{aligned} \max \{a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\} \end{aligned}$$

is convex in ${{\textbf{w}}}$. Consequently, minimization of bPOE for this function can be reduced to convex programming, see [6]. Finally, with bPOE functions, problem (25) can be formulated as follows

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}},\; {{\textbf{w}}}\in {{\textbf{W}}}} {\overline{p}}_{g_I({{\textbf{a}}},{{\textbf{w}}})}(0),\quad I\in \{F,H \}, \end{aligned}$$

(28)

where

$$\begin{aligned} \begin{aligned} g_F({{\textbf{a}}},{{\textbf{w}}})&={{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})),\\ g_H({{\textbf{a}}},{{\textbf{w}}})&=\max \{ a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}. \end{aligned} \end{aligned}$$

The algorithm for solving problem (28) is summarized in Appendix D.

3.2 bCDF and rCDF in constraints

This section considers minimization of an objective function $V({{\textbf{w}}})$ with respect to variables ${{\textbf{w}}}\in {\mathbb {R}}^k$:

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} V({{\textbf{w}}})\; \end{aligned}$$

(29)

subject to constraints on buffered and reduced CDFs:

$$\begin{aligned}{} & {} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \leqslant {\overline{\gamma }}, \end{aligned}$$

(30)

$$\begin{aligned}{} & {} {\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \geqslant {\underline{\gamma }}. \end{aligned}$$

(31)

With the functions $G_{\! F}(a_F \,{{\textbf{a}}}_F,{{\textbf{w}}})$ and $G_{\! H}(a_H \,{{\textbf{a}}}_H,{{\textbf{w}}})$ introduced in Sect. 3.1 and under the assumption that in (20) and (21), the infimum and supremum with respect to ${{\textbf{a}}}$ can be replaced by the minimum and maximum, respectively, problem (29)–(31) can be reformulated as follows

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}, \; a_I \in {\mathbb {R}}_+,\;{{\textbf{a}}}_I \in {{\textbf{A}}},\, I\in \{F,H \} } V({{\textbf{w}}}) \end{aligned}$$

(32)

subject to

$$\begin{aligned}{} & {} G_{\! F}(a_F \,{{\textbf{a}}}_F,{{\textbf{w}}})\leqslant {\overline{\gamma }}, \end{aligned}$$

(33)

$$\begin{aligned}{} & {} G_{\! H}(a_H \,{{\textbf{a}}}_H,{{\textbf{w}}})\leqslant -{\underline{\gamma }}. \end{aligned}$$

(34)

Relationship (26) implies that the constraint (33) with respect to variables $(a_F, {{\textbf{a}}}_F, {{\textbf{w}}})$ can be replaced by

$$\begin{aligned} {\overline{p}}_{{{\textbf{a}}}^{\! \top }_F \! ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) } (0) \leqslant {\overline{\gamma }} \end{aligned}$$

with respect to variables $({{\textbf{a}}}_F, {{\textbf{w}}})$. By (2), the last constraint is equivalent to the superquantile constraint:

$$\begin{aligned} {\overline{q}}_{{{\textbf{a}}}_{\! F }^{\! \top } ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) }(1-{\overline{\gamma }}) \leqslant 0. \end{aligned}$$

(35)

Similarly, (27) implies that the constraint (34) with respect to variables $(a_H, {{\textbf{a}}}_H, {{\textbf{w}}})$ can be replaced by

$$\begin{aligned} -1+{\overline{p}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}}(0) \leqslant -{\underline{\gamma }} \end{aligned}$$

with respect to variables $({{\textbf{a}}}_H, {{\textbf{w}}})$. By (2), the last constraint is equivalent to the superquantile constraint

$$\begin{aligned} {\overline{q}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n) }({\underline{\gamma }}) \leqslant 0. \end{aligned}$$

(36)

Finally, problem (32)–(34) can be recast in the form

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}, \;{{\textbf{a}}}_F \in {{\textbf{A}}},\, {{\textbf{a}}}_H \in {{\textbf{A}}}} V({{\textbf{w}}})\;\;\text { subject to (35) and (36).} \end{aligned}$$

(37)

Problem (37) can be solved iteratively as follows: solve (37) with respect to variables ${{\textbf{w}}}$ with fixed ${{\textbf{a}}}_F$ and ${{\textbf{a}}}_H$, then minimize the subquantile functions in (35) and (36) with respect to ${{\textbf{a}}}_F$ and ${{\textbf{a}}}_H$, respectively, i.e., solve

$$\begin{aligned}{} & {} \min _{{{\textbf{a}}}_F \in {{\textbf{A}}}} {\overline{q}}_{{{\textbf{a}}}_{\! F }^{\! \top } ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) }(1-{\overline{\gamma }}), \end{aligned}$$

(38)

$$\begin{aligned}{} & {} \min _{{{\textbf{a}}}_H \in {{\textbf{A}}}}{\overline{q}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n) }({\underline{\gamma }}), \end{aligned}$$

(39)

then solve (37) again with respect to ${{\textbf{w}}}$ with previously found ${{\textbf{a}}}_F$ and ${{\textbf{a}}}_H$, then solve (38) and (39), and so on.

The described algorithm for solving problem (37) is summarized in Appendix E.

4 Applications

4.1 Two-asset option

A two-asset option is an exotic option whose payoff depends on prices $p_A(t)$ and $p_B(t)$ of assets A and B at time t. For example, it may have nonzero payoff if $p_A(T)$ and $p_B(T)$ simultaneously exceed some strike prices $C_A$ and $C_B$, respectively, at some future time T. In this case, if $p_A(T)$ and $p_B(T)$ are assumed to be random variables, the option pays with the probability

$$\begin{aligned} p={\mathbb {P}}[p_A(T) \geqslant C_A, p_B(T) \geqslant C_B]. \end{aligned}$$

(40)

Since in general, $p_A(T)$ and $p_B(T)$ are not independent random variables, ${\mathbb {P}}[p_A (T)\geqslant C_A, p_B(T) \geqslant C_B]\not ={\mathbb {P}}[p_A(T) \geqslant C_A] \cdot {\mathbb {P}}[p_B(T) \geqslant C_B]$. If $p_A(0)$ and $p_B(0)$ are (known) prices of assets A and B at the current time $t=0$, then (40) can be recast in terms of adjusted rates of returns $r_A=(p_A(T)-C_A)/p_A(0)$ and $r_B=(p_B(T)-C_B)/p_B(0)$:

$$\begin{aligned} p={\mathbb {P}}\left[ r_A \geqslant 0, r_B \geqslant 0\right] . \end{aligned}$$

The joint distribution of $r_A$ and $r_B$ can be estimated from the historical data. Let $r_{A,i}$ and $r_{B,i}$, $i=1,\dots , T$, be the adjusted rates of return of assets A and B over the last T periods. Then p can be estimated as the number of times $(r_{A,i},r_{B,i})\geqslant 0$ divided by T. However, this estimate is sensitive to small perturbations in the data. Alternatively, we can define $X=(-r_A,-r_B)$ and estimate the smallest Schur-convex upper bound (9) for p:

$$\begin{aligned} \begin{aligned} {\overline{F}}_{-X}(0)&=\inf _{a\geqslant 0,\,b \geqslant 0} {\mathbb {E}}[a\,r_A + b\,r_B + 1]^+ \\&= \inf _{a\geqslant 0,\,b \geqslant 0} \frac{1}{T} \sum _{i=1}^T [a\,r_{A,i} + b\,r_{B,i} + 1]^+,\\&={\mathop{\mathop{\min}\limits_{a\geqslant 0, b \geqslant 0,}}\limits_{z_1,\dots ,z_T}}\frac{1}{T} \sum _{i=1}^T z_i\\&\quad \text {subject to}\quad z_i\geqslant a\,r_{A,i} + b\,r_{B,i} + 1,\quad z_i\geqslant 0,\quad i=1,\dots ,T, \end{aligned} \end{aligned}$$

which is a linear program.

4.2 Credit rating

Optimizing a portfolio of credit default swaps (CDSs) with constraints on the default probabilities of tranches [8] is a non-convex optimization problem. Let M be the number of tranches, T the number of time periods, and $x_m^t$, $m=1,\dots , M$, $t=1,\dots ,T$, the attachment point of tranche m at time t. Also, suppose there are K available CDSs. Let $\theta _k^t$ be the cumulative loss of k-th CDS at time t, and let $y_k$ be the weight of the k-th asset in the asset pool. Then the total loss of the CDS pool at time t is $L(\varvec{\theta }^t\!\!, {{\textbf{y}}}) = \sum _{k=1}^K \theta _k^t y_k$, where $\varvec{\theta }^t = (\theta ^t_1,\dots , \theta ^t_K)$ and ${{\textbf{y}}}=(y_1,\dots , y_K)$. Consider constraints on the default probabilities of tranches [8]:

$$\begin{aligned} 1-{\mathbb {P}}\left[ L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^1, \dots , L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^T\right] \leqslant p_m, \quad m=2,\dots , M. \end{aligned}$$

(41)

With $X_t = L(\varvec{\theta }^t,{{\textbf{y}}})-x_m^t$, $t=1,\dots ,T$, (41) can be recast as

$$\begin{aligned} {\mathbb {P}}\left[ \max \{X_1, \dots , X_T\}\geqslant 0\right] \leqslant p_m, \quad m=2,\dots , M. \end{aligned}$$

(42)

Each constraint in (42) is non-convex and can be approximated by

$$\begin{aligned} {\overline{p}}_{\max \{X_1, \dots , X_T\}}(0)\leqslant {\overline{p}}_m, \end{aligned}$$

(43)

where ${\overline{p}}_m$ is some scaled bound on probability; see [8]. Also, (15b)–(15d) imply that

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}}) \leqslant {\overline{p}}_{\max \{X_1, \dots , X_T\}}(0), \end{aligned}$$

where ${\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})$ is defined in (10). Consequently, $1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})$ is a better upper bound for ${\mathbb {P}}[\max (X_1, \dots ,X_T) \geqslant 0]$ than ${\overline{p}}_{\max \{X_1, \dots , X_T\}}(0)$, and (43) can be replaced by

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\leqslant {\overline{p}}_m. \end{aligned}$$

(44)

In fact, $1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})$ is the best Schur convex upper bound on this probability for an atomless probability space; see Proposition 2.5. The optimization problem considered in [8] is formulated by

$$\begin{aligned}&\qquad \min _{{{\textbf{x}}},{{\textbf{y}}}}\;\;\sum _{i=1}^T \frac{1}{(1+r)^t} \sum _{m=1}^M \Delta s_m {\mathbb {E}}[[x_{m+1}^t - L(\varvec{\theta }^t,{{\textbf{y}}})]^+] \\& \text {subject to}\;\;1-{\mathbb {P}}\left[ L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^1, \dots , L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^T\right] \leqslant p_m, \quad m=2,\dots , M, \\&\qquad\qquad\sum _{k=1}^K c_k y_k \geqslant \eta , \quad \sum _{k=1}^K y_k = 1, \quad y_k \geqslant 0,\quad k=1,\dots , K, \\&\qquad\qquad x_m^t \geqslant x_{m-1}^t, \; m=1,\dots ,M,\; t=1,\dots , T,\\&\qquad\qquad 0 \leqslant x_m^t \leqslant 1, m=2,\dots , M,\; t=1,\dots ,T, \end{aligned} $$

(45)

where ${{\textbf{x}}}=\{x_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}$, r is the interest rate, $\Delta s_m = s_m - s_{m-1}$, $s_m$ is the spread payment for each tranche $m=1, \dots , M$, $c_k$ is the annual income spread payment of the k-th CDS, $k=1, \dots , K$, and $\eta$ is a real-valued parameter. After replacing constraints (41) by (44), the optimization problem (45) takes the form

$$\begin{aligned}&\qquad \min _{{{\textbf{x}}},{{\textbf{y}}},{{\textbf{a}}}}\;\;\sum _{i=1}^T \frac{1}{(1+r)^t} \sum _{m=1}^M \Delta s_m {\mathbb {E}}[[x_{m+1}^t - L(\varvec{\theta }^t\!\!,{{\textbf{y}}})]^+] \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad (46\text{a})\\& \text {subject to}\;\;{\mathbb {E}}[\max \{a^m_1 (L(\varvec{\theta }^1\!\!,{{\textbf{y}}}) - x_m^1),\dots , a^m_T (L(\varvec{\theta }^T\!\!,{{\textbf{y}}}) - x_m^T )\} + 1]^+ \leqslant {\overline{p}}_m, \quad m=2,\dots , M,\qquad \;\; (46\text{b})\\ &\qquad\qquad\;\;\; a^m_t \geqslant 0,\quad m=2,\dots ,M,\quad t=1,\dots , T, \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad \;\;(46\text{c}) \\&\qquad\qquad\;\; \sum _{k=1}^K c_k y_k \geqslant \eta , \sum _{k=1}^K y_k = 1, \quad y_k \geqslant 0,\quad k=1,\dots , K, \qquad\qquad\qquad\qquad\qquad\qquad \qquad \quad (46\text{d}) \\ &\qquad\qquad\;\; x_m^t \geqslant x_{m-1}^t, \; m=1,\dots ,M,\; t=1,\dots , T, \quad 0 \leqslant x_m^t \leqslant 1, m=2,\dots , M,\; t=1,\dots ,T, \;(46\text{e})\end{aligned}$$

where ${{\textbf{a}}}=\{a_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}$.

5 Case study

Veremyev et al. [17] solved the CDS portfolio problem (45) and found an optimal portfolio of CDSs along with optimal attachment/detachment points over a multi-period horizon. Pertaia et al. [8] replaced multivariate POE constraints in (45) by one-dimensional bPOE constraints (43) and solved the corresponding optimization problem.^{Footnote 5}

Here we use Algorithm 2 to solve problem (46a)–(46e) for 53 CDSs over 5 time periods with 5 tranches for 300,000 simulated scenarios and for $\eta =120$ and $\eta =80$ in (46d).^{Footnote 6} The experiments were run on an Intel i7-2.6Ghz processor with 32GB DDR4-3200Mhz RAM and used Portfolio Safeguard (PSG),^{Footnote 7} which has precoded superquantile (CVaR) function as well as many other popular risk functions.

Table 1 presents objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 with $\eta =120$ and $\epsilon =10^{-7}$. It shows that no improvements are made after the second iteration (Algorithm 2 stops after the third iteration, but Table 1 shows one more iteration to demonstrate that there are no further improvements in the objective value). Constraint (36) in problem (37), which is solved in Steps 2 and 4, is equivalent to constraints (46b) with $m=2$, 3, 4, 5. Table 2 reports slack in constraints (46b) for $m=2$, 3, 4, 5 as percentage of the constraint right-hand side, i.e.,

$$\begin{aligned} \delta _m=100\%\cdot \frac{{\overline{p}}_m-{\mathbb {E}}[\max \{a^m_1 (L(\varvec{\theta }^1\!\!,{{\textbf{y}}}) - x_m^1),\dots , a^m_T (L(\varvec{\theta }^T\!\!,{{\textbf{y}}}) - x_m^T )\} + 1]^+}{{\overline{p}}_m}. \end{aligned}$$

(47)

At Step 2 in the first iteration, constraints (46b) for $m=3$, 4, 5 are active and after that, all constraints in (46b) become inactive till the algorithm stops after the third iteration. Since problem (46a)–(46e) without constraints (46b) is convex and none of constraints (46b) is active at optimality, the found solution is globally optimal. Note that one-dimensional bPOE constraints (43) are active, as we see from Step 2 in the first iteration.

Tables 3 and 4 present results similar to those in Tables 1 and 2 but for $\eta =80$. This time, the objective value stops improving after the second iteration even if constraints (46b) for $m=3$, 4, 5 remain active in all iterations.

The case study shows that optimization problems with bCDF functions can be efficiently optimized (even for large-scale optimization problems with 300,000 scenarios).

Table 1 Objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 for $\eta =120$ in constraint (46d)

Full size table

Table 2 Slack, $\delta _m$ (defined by (47)), in constraints (46b) for $m=2$, 3, 4, 5, in Algorithm 2 for $\eta =120$ in constraint (46d)

Full size table

Table 3 Objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 for $\eta =80$ in constraint (46d)

Full size table

Table 4 Slack, $\delta _m$ (defined by (47)), in constraints (46b) for $m=2$, 3, 4, 5, in Algorithm 2 for $\eta =120$ in constraint (46d)

Full size table

Notes

Here, “safety first principle based on the CDF” means that for a given threshold $x \in {{\mathbb {R}}}$, an r.v. X is preferred to an r.v. Y, if and only if $F_X(x)={\mathbb {P}}[X\leqslant x]$ does not exceed $F_Y(x)={\mathbb {P}}[Y\leqslant x]$. For example, let ${\mathbb {P}}[X=-1]=1/2$ and ${\mathbb {P}}[X=1]=1/2$. Then for $x=0$, ${\mathbb {P}}[X\leqslant 0]={\mathbb {P}}[-X\leqslant 0]=1/2$, whereas ${\mathbb {P}}[(X+(-X))/2\leqslant 0]={\mathbb {P}}[0\leqslant 0]=1>1/2$. Consequently, both X and $-X$ are (strictly) preferred to $(X+(-X))/2$, contrary to the perception that aggregation reduces risk.
Inequalities between vectors hold component-wise.
As usual, the indicator function of the event A is the random variable $I_A$ such that $I_A(w)=1$ for $w \in A$ and $I_A(w)=0$ for $w \not \in A$.
See, e.g., [11, Theorem 27.1] for sufficient conditions for a convex function to attain its minimum.
Solution of problem (45) and of the one with bPOE constraints are available at http://uryasev.ams.stonybrook.edu/index.php/research/testproblems/financial_engineering/structuring-step-up-cdo/
The codes and solutions of this problem are available at http://uryasev.ams.stonybrook.edu/index.php/research/testproblems/financial_engineering/structuring-step-up-cdo/.
Portfolio Safeguard software is a product of American Optimal Decisions, Inc. (http://aorda.com). To optimize convex nonsmooth functions, PSG solves a sequence of linear or quadratic programming problems. For large-scale optimization problems, PSG can call GUROBI solver (Gurobi Optimization, https://www.gurobi.com/).

References

Bertsekas, D.: Nonlinear programming. J. Oper. Res. Soc. 48(3), 334 (1997)
Article Google Scholar
Grechuk, B., Molyboha, A., Zabarankin, M.: Chebyshev’s inequalities with law invariant deviation measures. Probab. Eng. Inf. Sci. 24, 145–170 (2010)
Article MathSciNet Google Scholar
Grippof, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Article MathSciNet Google Scholar
Grippof, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)
Article MathSciNet Google Scholar
Lyu, H.: Convergence and complexity of block coordinate descent with diminishing radius for nonconvex optimization, arXiv:2012.03503 (2020)
Mafusalov, A., Uryasev, S.: Buffered probability of exceedance: mathematical properties and optimization. SIAM J. Optim. 28(2), 1077–1103 (2018)
Article MathSciNet Google Scholar
Norton, M., Uryasev, S.: Maximization of AUC and buffered AUC in binary classification. Math. Program. 174(1–2), 575–612 (2019)
Article MathSciNet Google Scholar
Pertaia, G., Prokhorov, A., Uryasev, S.: A new approach to credit ratings. J. Bank. Finance 140, 106097 (2022)
Article Google Scholar
Pinelis, I.: On the convexity of certain set of random vectors. Mathoverflow, https://mathoverflow.net/questions/415870/
Powell, M.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis, p. 451. Princeton University Press, Princeton, NJ (1970)
Book Google Scholar
Rockafellar, R.T., Royset, J.O.: Random variables, monotone relations, and convex analysis. Math. Program. 148, 297–331 (2014)
Article MathSciNet Google Scholar
Rockafellar, R.T., Royset, J.O.: On buffered failure probability in design and optimization of structures. Reliab. Eng. Syst. Saf. 95, 499–510 (2011)
Article Google Scholar
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Article Google Scholar
Rockafellar, R.T., Uryasev, S.: Conditional value-at-Risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)
Article Google Scholar
Rockafellar, R.T., Uryasev, S., Zabarankin, M.: Generalized deviations in risk analysis. Finance Stoch. 10(1), 51–74 (2006)
Article MathSciNet Google Scholar
Veremyev, A., Tsyurmasto, P., Uryasev, S.: Optimal structuring of CDO contracts: optimization approach. J. Credit Risk 8(4), 133–155 (2012)
Article Google Scholar
Zabarankin, M., Uryasev, S.: Statistical Decision Problems: Selected Concepts and Portfolio Safeguard Case Studies. Springer, Berlin (2014)
Book Google Scholar

Download references

Acknowledgements

We are grateful to Mr. Edward Cummings from the Stony Brook University for the numerical implementation of the case study. We are also grateful to the referees for their comments and suggestions, which helped to improve the quality of the paper.

Author information

Authors and Affiliations

School of Computing and Mathematic Sciences, University of Leicester, Leicester, LE1 7RH, UK
Bogdan Grechuk
Departement of Mathematical Sciences, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ, USA
Michael Zabarankin
Independent researcher, Memphis, TN, USA
Alexander Mafusalov
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA
Stan Uryasev

Authors

Bogdan Grechuk
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zabarankin
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mafusalov
View author publications
You can also search for this author in PubMed Google Scholar
Stan Uryasev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bogdan Grechuk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A. Proof of proposition 2.2

With notation

$$\begin{aligned} {\overline{p}}({{\textbf{X}}}) = {\overline{F}}_{-{{\textbf{X}}}}(0) = \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \end{aligned}$$

(48)

Proposition 2.2 can be reformulated as follows.

Proposition A.1

For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$,

$$\begin{aligned} {\overline{p}}({{\textbf{X}}}) = \sup _{W \in L^{1}(\Omega )} \, {\mathbb {E}}[W]\quad \text {subject to}\quad {\mathbb {E}}[X_i W] \geqslant 0, \quad i=1,\dots ,n, \quad W\in [0,1]. \end{aligned}$$

(49)

Proof

Let ${\overline{p}}'({{\textbf{X}}})$ be the right-hand side of (49). We first prove that ${\overline{p}}'({{\textbf{X}}}) \leqslant {\overline{p}}({{\textbf{X}}})$. Let ${{{\mathcal {W}}}}$ be the feasible set in (49). For every $W \in {{{\mathcal {W}}}}$, and every ${{\textbf{a}}}\in {\mathbb {R}}^n_+$,

$$\begin{aligned} {\mathbb {E}}[W] \leqslant {{\mathbb {E}}}[W] + {{\textbf{a}}}^\top {\mathbb {E}}[{{\textbf{X}}}_i W] + {{\mathbb {E}}}[(1-W)[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+]. \end{aligned}$$

Since the second and the third terms in the right-hand side are non-negative,

$$\begin{aligned} {\mathbb {E}}[W] \leqslant {{\mathbb {E}}}[W(1+{{\textbf{a}}}^\top {{\textbf{X}}}- [{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+)] + {{\mathbb {E}}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \leqslant {{\mathbb {E}}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+, \end{aligned}$$

and since $W \in {{{\mathcal {W}}}}$ and ${{\textbf{a}}}\in {\mathbb {R}}^n_+$ are arbitrary, ${\overline{p}}'({{\textbf{X}}}) \leqslant {\overline{p}}({{\textbf{X}}})$ follows.

We next prove ${\overline{p}}'({{\textbf{X}}}) \geqslant {\overline{p}}({{\textbf{X}}})$. For $n=1$, X is a random variable and ${\overline{p}}'(X) \geqslant {\overline{p}}(X)$ was claimed in the proof of Proposition 2.5 in [6] (however, the proof was incomplete). In this case, let ${\underline{q}}_X(\alpha )$ be defined by (3) and let ${\underline{q}}^{-1}_X(x)$ be the inverse function for ${\underline{q}}_X(\alpha )$, which is properly defined on $x\in [{\mathbb {E}}X, \sup X)$, see [6].

Then consider the following cases:

If $\sup X <0$, then ${{{\mathcal {W}}}}=\{0\}$ and ${\overline{p}}'(X)={\mathbb {E}}[0]=0$.
If $\sup X = 0$, then $I_{\{X=0\}} \in {{{\mathcal {W}}}}$, hence ${\overline{p}}'(X)\geqslant {\mathbb {E}}[I_{\{X=0\}}]={\mathbb {P}}[X=0]={\mathbb {P}}[X=\sup X]$.
If ${{\mathbb {E}}}[X]< 0 < \sup X$, then let $t=q_X({\underline{q}}^{-1}_X(0))$ and let $A^-,A^0,A^+$ be the events $X<t$, $X=t$, and $X>t$, respectively. The definition of t implies that
$$\begin{aligned} 1-{\mathbb {P}}[A^+]-{\mathbb {P}}[A^0]\leqslant {\underline{q}}^{-1}_X(0) \leqslant 1-{\mathbb {P}}[A^+]. \end{aligned}$$
(50)
By definition of ${\underline{q}}^{-1}_X(0)$,
$$\begin{aligned} \begin{aligned} 0&= \int _{{\underline{q}}^{-1}_X(0)}^1 H_{X}(\alpha )d\alpha = \int _{{{\underline{q}}^{-1}_X(0)}}^{1-{\mathbb {P}}[A^+]} t d\alpha + \int _{1-{\mathbb {P}}[A^+]}^1 H_X(\alpha )d\alpha \\&= t\left( 1-{\mathbb {P}}[A^+]-{{\underline{q}}^{-1}_X(0)}\right) + \int _{A^+} X(\omega )d{\mathbb {P}}= r t {\mathbb {P}}[A_0] + \int _{A^+} X(\omega )d{\mathbb {P}}, \end{aligned} \end{aligned}$$
where $r=\frac{1-{\mathbb {P}}[A^+]-{{\underline{q}}^{-1}_X(0)}}{{\mathbb {P}}[A_0]}$ if ${\mathbb {P}}[A_0]>0$ and $r=0$ if ${\mathbb {P}}[A_0]=0$. Note that $0\leqslant r \leqslant 1$ by (50).

Now, let W be an r.v. assuming values of 0, r, and 1 on $A^-$, $A^0$, and $A^+$, respectively. Then
$$\begin{aligned} {{\mathbb {E}}}[XW] = 0 + r \int _{A^0} X(\omega )d{\mathbb {P}}+ \int _{A^+} X(\omega )d{\mathbb {P}}= r t {\mathbb {P}}[A_0] + \int _{A^+} X(\omega )d{\mathbb {P}}= 0, \end{aligned}$$
hence $W \in {{{\mathcal {W}}}}$. Thus,
$$\begin{aligned} {\overline{p}}'(X)\geqslant {{\mathbb {E}}}[W] = r {\mathbb {P}}[A_0] + {\mathbb {P}}[A_+] = 1 - {{\underline{q}}^{-1}_X(0)}. \end{aligned}$$
If $0 \leqslant {{\mathbb {E}}}[X]$ then $W \equiv 1 \in {{{\mathcal {W}}}}$, hence ${\overline{p}}'(X)= {{\mathbb {E}}}[1]=1$.

In summary, ${\overline{p}}'(X)\geqslant {\overline{p}}^+_X(0)$, where ${\overline{p}}^+_X(x)$ is defined in Definition 3.9 of [6]. By Proposition 3.10 in [6], ${\overline{p}}^+_X(0)=\min _{a \in {\mathbb {R}}^n_+} {\mathbb {E}}[a^\top X + 1]^+$, which finishes the proof of ${\overline{p}}'(X) \geqslant {\overline{p}}(X)$ for $n=1$.

Now let n be an arbitrary positive integer. By contradiction, suppose that ${\overline{p}}'({{\textbf{X}}}) < {\overline{p}}({{\textbf{X}}})$ and choose any $k \in ({\overline{p}}'({{\textbf{X}}}), {\overline{p}}({{\textbf{X}}}))$. Let

$$\begin{aligned} K=\{\left. {{\textbf{y}}}\in {\mathbb {R}}^n\right| {{\textbf{y}}}={{\mathbb {E}}}[{{\textbf{X}}}W] \text { for some } W\in [0,1] \text { with } {{\mathbb {E}}}[W]\geqslant k\}. \end{aligned}$$

Then K is a convex set, and ${\overline{p}}'({{\textbf{X}}}) < k$ implies that ${\mathbb {R}}^n_+ \cap K = \emptyset$. By separation hyperplane theorem, this implies that there exists a vector ${\textbf{c}} \in {\mathbb {R}}^n$ and constant b such that (i) ${\textbf{c}}^\top {{\textbf{y}}}\geqslant b$ for all ${{\textbf{y}}}\in {\mathbb {R}}^n_+$ but (ii) ${\textbf{c}}^\top {{\textbf{y}}}< b$ for all ${{\textbf{y}}}\in K$. Condition (i) implies that ${\textbf{c}} \geqslant 0$ and $b \leqslant 0$. Then (ii) implies that for every W satisfying $0\leqslant W \leqslant 1$ and ${{\mathbb {E}}}[W]\geqslant k$, we have $0 \geqslant b > {\textbf{c}}^\top {{\mathbb {E}}}[{{\textbf{X}}}W] = {{\mathbb {E}}}[Y W]$, where $Y = {\textbf{c}}^\top {{\textbf{X}}}$. But this implies that ${\overline{p}}'(Y) < k$. On the other hand,

$$\begin{aligned} {\overline{p}}(Y)=\inf _{a \in {\mathbb {R}}_+} {\mathbb {E}}[a Y + 1]^+ = \inf _{a \in {\mathbb {R}}_+} {\mathbb {E}}[a\, {\textbf{c}}^\top {{\textbf{X}}}+ 1]^+ \geqslant \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ = {\overline{p}}({{\textbf{X}}}) > k. \end{aligned}$$

Since Y is a one-dimensional random variable, this contradicts to ${\overline{p}}'(X) \geqslant {\overline{p}}(X)$ for the case $n=1$ proved above. $\square$

1.2 B. Proof of proposition 2.3

With notation (48), Proposition 2.3 can be reformulated as

Proposition B.1

On an atomless probability space $\Omega$, ${\overline{p}}({{\textbf{X}}})$ is the minimal Schur-convex upper bound for $p({{\textbf{X}}})={\mathbb {P}}[{{\textbf{X}}}\geqslant 0]$.

Proof

For any ${{\textbf{a}}}\in {\mathbb {R}}^n_+$, function $g({{\textbf{x}}})=[{{\textbf{a}}}^\top {{\textbf{x}}}+ 1]^+$ is a convex function of ${{\textbf{x}}}$, hence ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$ implies that ${\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \geqslant {\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{Y}}}+ 1]^+$. Taking infimum over all ${{\textbf{a}}}\in {\mathbb {R}}^n_+$, we obtain ${\overline{p}}({{\textbf{X}}}) \geqslant {\overline{p}}({{\textbf{Y}}})$, and consequently, the function ${\overline{p}}({{\textbf{X}}})$ is Schur-convex. Also, for any ${{\textbf{a}}}\in {\mathbb {R}}^n_+$, we have $[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \geqslant I_{\{{{\textbf{X}}}\geqslant 0\}}$, so that ${\overline{p}}({{\textbf{X}}}) \geqslant {{\mathbb {E}}}[I_{\{{{\textbf{X}}}\geqslant 0\}}]={\mathbb {P}}[{{\textbf{X}}}\geqslant 0) = p({{\textbf{X}}})$ is indeed an upper bound for $p({{\textbf{X}}})$. It is left to prove that this upper bound is minimal, i.e., for every ${{\textbf{X}}}\in L^{1,n}(\Omega )$, there exist a ${{\textbf{Y}}}\in L^{1,n}(\Omega )$ such that ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$ and ${\mathbb {P}}[{{\textbf{Y}}}\geqslant 0] \geqslant {\overline{p}}({{\textbf{X}}})$.

Let ${{{\mathcal {W}}}}$ be a feasible set in (49). Consider probability space $\Omega '=\Omega \times [0,1]$, where [0, 1] is the unit interval with Lebesgue measure. Write elements of $\Omega '$ as $(\omega ,t)$, where $\omega \in \Omega$ and $t\in [0,1]$. Let $A=\{(\omega ,t) \in \Omega ':W(\omega ) \leqslant t\}$ for arbitrary $W \in {{{\mathcal {W}}}}$. Then ${\mathbb {P}}'(A)={{\mathbb {E}}}[W]$, where ${\mathbb {P}}'$ is the probability measure on $\Omega '$. For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$, let ${{\textbf{X}}}' \in L^{1,n}(\Omega ')$ be the random vector defined by ${{\textbf{X}}}'(\omega ,t)={{\textbf{X}}}(\omega )$. Then random vectors ${{\textbf{X}}}$ and ${{\textbf{X}}}'$ have the same distribution. Let ${{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|I_A]$, where $I_A$ is the indicator function of an event A. Then $W \in {{{\mathcal {W}}}}$ implies that ${{\mathbb {E}}}[{{\textbf{X}}}_i W] \geqslant 0$, $i=1,\dots ,n$, which in turn implies that ${{\mathbb {E}}}[X'_i|I_A=1]\geqslant 0$ for each component $X'_i$ of ${{\textbf{X}}}'$, or, equivalently, that ${{\textbf{Y}}}'(\omega ,t) \geqslant 0$ whenever $(\omega ,t)\in A$. Consequently, ${\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0] \geqslant {\mathbb {P}}'(A) = {{\mathbb {E}}}[W]$. If there exists ${{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}$ such that ${\mathbb {P}}[{{\textbf{Y}}}\geqslant 0]={\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0]$, then, since $W \in {{{\mathcal {W}}}}$ is arbitrary, this would imply that $\sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}}{\mathbb {P}}[{{\textbf{Y}}}\geqslant 0] \geqslant \sup _{W \in {{{\mathcal {W}}}}}{{\mathbb {E}}}[W] = {\overline{p}}({{\textbf{X}}})$, where the last equality is (49). Indeed, ${{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|I_A]$ implies that ${{\textbf{X}}}' \geqslant _{cx} {{\textbf{Y}}}'$. Since $\Omega$ is atomless, there exists a pair of random vectors ${{\textbf{X}}}'' \in L^{1,n}(\Omega )$ and ${{\textbf{Y}}}\in L^{1,n}(\Omega )$ with the same joint distribution as ${{\textbf{X}}}'$ and ${{\textbf{Y}}}'$. Then ${\mathbb {P}}[{{\textbf{Y}}}\geqslant 0]={\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0]$ and ${{\textbf{X}}}'' \geqslant _{cx} {{\textbf{Y}}}$. Since ${{\textbf{X}}}$ and ${{\textbf{X}}}''$ have the same distribution, ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$, as required. $\square$

1.3 C. Proof of proposition 2.5

Proposition 2.5 can be reformulated as follows.

Proposition C.1

Let

$$\begin{aligned}{} & {} {\overline{F}}_1({{\textbf{X}}})= 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}}) = \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\max \{a_1 X_1,\dots , a_n X_n\} + 1]^+, \end{aligned}$$

(51)

$$\begin{aligned}{} & {} \begin{aligned} {\overline{F}}_2({{\textbf{X}}}) = \sup _{{{\textbf{w}}}\in L^{\infty ,n}(\Omega )} \,&\sum _{i=1}^n {\mathbb {E}}[W_i]\\ \text {subject to}&\quad {\mathbb {E}}[X_i W_i] \geqslant 0,\;\; W_i\in [0,1],\quad I=1,\dots ,n,\;\; \sum _{i=1}^n W_i\in [0,1]. \end{aligned} \end{aligned}$$

(52)

Also, let

$$\begin{aligned} {\overline{F}}_3({{\textbf{X}}})=\sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}} {\mathbb {P}}[\max \{Y_1,\dots ,Y_n\} \geqslant 0]. \end{aligned}$$

Then on an atomless probability space,

$$\begin{aligned} {\overline{F}}_1({{\textbf{X}}}) = {\overline{F}}_2({{\textbf{X}}}) = {\overline{F}}_3({{\textbf{X}}}). \end{aligned}$$

Proof

First, let us show that (51) is a dual problem to (52): introduce dual variables $a_i \geqslant 0$ and $0 \leqslant M \in L^{1}(\Omega )$ for constraints ${\mathbb {E}}[X_i W_i] \geqslant 0$ and $\sum _{i=1}^n W_i \leqslant 1$, respectively. Minimization of the Lagrangian for (52) with respect to ${{\textbf{a}}}= (a_1,\ldots ,a_n)$ and M yields

$$\begin{aligned} \begin{aligned}&\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+, 0 \leqslant M \in L^{1}(\Omega )} {\mathbb {E}}[M] + \sup _{{{\textbf{w}}}\in L^{\infty , d},\; {{\textbf{w}}}\in [0,1]} \sum _{i=1}^n {\mathbb {E}}[W_i(1 + a_i X_i - M)]\\&= \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+, 0 \leqslant M \in L^{1}(\Omega )} {\mathbb {E}}\left[ M + \sum _{i=1}^n [1 + a_i X_i - M]^+\right]\\& = \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}\left[ \max \{a_1 X_1 + 1,\dots ,a_n X_n + 1\}\right] ^+, \end{aligned} \end{aligned}$$

where the last equality follows from comparing M to $m = \max \{a_1 X_1 + 1,\dots ,a_n X_n+1\}$. Indeed, if $m < 0$ then the subdifferential $\partial (M + \sum _{i=1}^n [1 + a_i X_i - M]^+) = 1$ for $M \geqslant 0$, so $M = 0$ is optimal; otherwise, at $M = m$, the subdifferential is $1 + k \cdot [-1, 0] + (n - k) \cdot 0 \ni 0$, for some $k\in \{1,\dots ,n\}$, so $M = m$ is optimal.

Now, let ${{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}$ and ${{\textbf{a}}}\in {\mathbb {R}}^n_+$ be arbitrary. The function $u(X)=[\max \{a_1 X_1,\dots ,a_n X_n\} + 1]^+$ is the maximum of linear functions and therefore is convex. Consequently,

$$\begin{aligned} E[u({{\textbf{X}}})] \geqslant E[u({{\textbf{Y}}})] \geqslant E[I_{\{\max \{Y_1,\dots ,Y_n\} \geqslant 0\}}] = {\mathbb {P}}[\max \{Y_1,\dots ,Y_n\} \geqslant 0]. \end{aligned}$$

This implies that

$$\begin{aligned} {\overline{F}}_1({{\textbf{X}}}) \geqslant {\overline{F}}_3({{\textbf{X}}}). \end{aligned}$$

We next adapt the proof of Proposition 2.3 to prove that

$$\begin{aligned} {\overline{F}}_3({{\textbf{X}}}) \geqslant {\overline{F}}_2({{\textbf{X}}}). \end{aligned}$$

Indeed, take any $W_1, \dots , W_n$ in the feasible set of (52). Consider probability space $\Omega '=\Omega \times [0,1]$, where [0, 1] is the unit interval with Lebesgue measure. Write elements of $\Omega '$ as $(\omega ,t)$, where $\omega \in \Omega$ and $t\in [0,1]$. For every $j = 1,\dots ,n$, let $A_j=\{(\omega ,t) \in \Omega ':\sum _{i=1}^{j-1}W_i(\omega ) < t \leqslant \sum _{i=1}^{j}W_i(\omega )\}$. Also, let $A=\{(\omega ,t) \in \Omega ':\sum _{i=1}^n W_i(\omega ) < t\}$. Then sets $A_1,\dots ,A_n,A$ form a partition of $\Omega '$, which we denote ${{{\mathcal {F}}}'}$. Note that $1-{\mathbb {P}}'(A)={{\mathbb {E}}}[e^\top W]$, where ${\mathbb {P}}'$ is the probability measure on $\Omega '$ and $e^\top W=\sum _{i=1}^n W_i$. For any ${{\textbf{X}}}\in L^{1,n}(\Omega )$, let ${{\textbf{X}}}' \in L^{1,n}(\Omega ')$ be the random vector defined by ${{\textbf{X}}}'(\omega ,t)={{\textbf{X}}}(\omega )$. Then random vectors ${{\textbf{X}}}$ and ${{\textbf{X}}}'$ have the same distribution. Let ${{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|{{{\mathcal {F}}}'}]$. Since $W_1, \dots , W_n$ are in the feasible set of (52), we have ${{\mathbb {E}}}[{{\textbf{X}}}_i W_i] \geqslant 0$, $i=1,\dots ,n$, which in turn implies that ${{\mathbb {E}}}[{{\textbf{X}}}'_i|I_{A_i}=1]\geqslant 0$ for each component $X'_i$ of ${{\textbf{X}}}'$, or, equivalently, that $\max _i Y'_i(\omega ,t) \geqslant 0$ whenever $(\omega ,t)\not \in A$. Consequently, ${\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0] \geqslant 1-{\mathbb {P}}'[A] = {{\mathbb {E}}}[{\textbf{e}}^\top {{\textbf{w}}}]$. If there exists ${{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}$ such that ${\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0]={\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0]$, then, since $W_1, \dots , W_n$ in the feasible set of (52) are arbitrary, this would imply that ${\overline{F}}_3({{\textbf{X}}}) = \sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}}{\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0] \geqslant \sup _{W \in {{{\mathcal {W}}}}}{{\mathbb {E}}}[{\textbf{e}}^\top {{\textbf{w}}}] = {\overline{F}}_2({{\textbf{X}}})$. Indeed, ${{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|{{{\mathcal {F}}}'}]$ implies that ${{\textbf{X}}}' \geqslant _{cx} {{\textbf{Y}}}'$. Since $\Omega$ is atomless, there exists a pair of random vectors ${{\textbf{X}}}'' \in L^{1,n}(\Omega )$ and ${{\textbf{Y}}}\in L^{1,n}(\Omega )$ with the same joint distribution as ${{\textbf{X}}}'$ and ${{\textbf{Y}}}'$. Then ${\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0]={\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0]$ and ${{\textbf{X}}}'' \geqslant _{cx} {{\textbf{Y}}}$. Since ${{\textbf{X}}}$ and ${{\textbf{X}}}''$ have the same distribution, ${{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}$, as required.

Finally, ${\overline{F}}_2({{\textbf{X}}}) \geqslant {\overline{F}}_1({{\textbf{X}}})$ follows from the fact that (51) is the problem dual to (52). $\square$

1.4 D. The algorithm for solving problem (28)

Algorithm 1 (bCDF and rCDF in objective).

Step 1. Set a = (1, …1), $\epsilon$ = small number, I = F or H.
Step 2. Solve $\min_{{\mathbf{w}} \in {\mathbf{W}}} \overline{p}_{g1\left( {{\mathbf{a}},{\mathbf{w}}} \right)} \left( 0 \right):{\text{let}}\;\widetilde{{\mathbf{w}}}$ be its optimal solution.
Step 3. Solve $\min_{{\mathbf{a}} \in {\mathbf{A}}} \overline{p}_{g1\left( {{\mathbf{a}},\widetilde{{\mathbf{w}}}} \right)} \left( 0 \right):{\text{let}}\;\widetilde{{\mathbf{a}}}$ be its optimal solution. Set $\widetilde{p} = \overline{p}_{g1\left( {\widetilde{{\mathbf{a}}},\widetilde{{\mathbf{w}}}} \right)} \left( 0 \right).$
Step 4. Use $\widetilde{{\mathbf{w}}}$ as an initial approximation to solve $\min_{{\mathbf{w}} \in {\mathbf{W}}} \overline{p}_{g1\left( {\widetilde{{\mathbf{a}}},{\mathbf{w}}} \right)} \left( 0 \right):{\text{let}}$ w* be its optimal solution.
Step 5. Use $\widetilde{{\mathbf{a}}}$ as an initial approximation to solve $\min_{{\mathbf{a}} \in {\mathbf{A}}} \overline{p}_{g1\left( {{\mathbf{a}},{\mathbf{w^{}}}} \right)} \left( 0 \right):$ let a be its optimal solution. Set $p^{} = \overline{p}_{g1\left( {{\mathbf{a^{}}},{\mathbf{w^{*}}}} \right)} \left( 0 \right).$
Step 6. If $\widetilde{p} - p^{} < \epsilon ,$ then Stop, otherwise set $\widetilde{{\mathbf{w}}} = {\mathbf{w^{},}}\;\widetilde{{\mathbf{a}}} = {\mathbf{a^{}, and }}\;\widetilde{p} = p^{}$, and go to Step 4.

1.5 E. The algorithm for solving problem (37)

Algorithm 2 (bCDF and rCDF in constraint).

Step 1. Set a = (1, …1), $\epsilon$ = small number.
Step 2. Solve (37) with respect to w: let $\widetilde{{\mathbf{w}}}$ be its optimal solution. Set $\widetilde{v} = V(\widetilde{{\mathbf{w}}}).$
Step 3. Solve (38) and (39) with w= $\widetilde{{\mathbf{w}}}$:let $\widetilde{{\mathbf{a}}}_{{F}}$ and $\widetilde{{\mathbf{a}}}_{{H}}$ be optimal solutions of (38) and (39), respectively.
Step 4. Use $\widetilde{{\mathbf{w}}}$ as an initial approximation to solve (37) with respect to w with ${\mathbf{a}}_{{F}}$ = $\widetilde{{\mathbf{a}}}_{{F}}$ and ${\mathbf{a}}_{{H}}$ = $\widetilde{{\mathbf{a}}}_{{H}}$: let w* be its optimal solution. Set $v^{}$ = V(w).
Step 5. Use $\widetilde{{\mathbf{a}}}$ as an initial approximation to solve (38) and (39) with w = w: let ${\mathbf{a}}_F^{}$ and ${\mathbf{a}}_H^{*}$ be optimal solutions of (38) and (39), respectively.
Step 6. If $\widetilde{v} - v^{} < \epsilon ,$ then Stop, otherwise set $\widetilde{{\mathbf{w}}} = {\mathbf{w^{}}},\;\widetilde{{\mathbf{a}}}_F = {\mathbf{a}}_F^{} {\text{, and }}\widetilde{{\mathbf{a}}}_H = {\mathbf{a}}_H^{} \;{\text{and}}\;\widetilde{v} = v^{*}$, and go to Step 4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grechuk, B., Zabarankin, M., Mafusalov, A. et al. Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization. Optim Lett 18, 403–426 (2024). https://doi.org/10.1007/s11590-023-02045-1

Download citation

Received: 07 October 2022
Accepted: 09 July 2023
Published: 09 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11590-023-02045-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization

Abstract

Similar content being viewed by others

DICE: A New Family of Bivariate Estimation of Distribution Algorithms Based on Dichotomised Multivariate Gaussian Distributions

On the Partial-Geometric Distribution: Properties and Applications

The Distributions of the Mean of Random Vectors with Fixed Marginal Distribution

1 Introduction

2 The lower and upper bounds for multivariate CDF

Definition 1

Definition 2

Definition 3

Remark

Proposition 2.1

Proof

Proposition 2.2

Proof

Proposition 2.3

Proof

Proposition 2.4

Proof

Remark

Proposition 2.5

Proof

3 Buffered and reduced CDFs in optimization problems

3.1 bCDF and rCDFs in objective

3.2 bCDF and rCDF in constraints

4 Applications

4.1 Two-asset option

4.2 Credit rating

5 Case study

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 A. Proof of proposition 2.2

Proposition A.1

Proof

1.2 B. Proof of proposition 2.3

Proposition B.1

Proof

1.3 C. Proof of proposition 2.5

Proposition C.1

Proof

1.4 D. The algorithm for solving problem (28)

1.5 E. The algorithm for solving problem (37)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation