1 Introduction

The cumulative distribution function (CDF) of a random variable (r.v.) is a fundamental notion in probability theory and plays a central role in stochastic optimization, risk management, statistics, reliability theory and various applications. For instance, engineered systems, e.g., electrical grids and gas/oil pipelines, must be designed to comply with various safety and reliability regulations formulated in terms of the probability of failure calculated with CDFs. Also, smart electrical grids should be robust to rare events including power station faults and electromagnetic pulses, whereas gas pipeline operations should meet the consumer demand while being resilient to unforeseen production disruptions. Design of engineered structures and industrial systems involve multiple uncertain characteristics that are difficult to combine into a single loss function. For example, each watertight compartment of a submarine should be designed to maximize its survivability in case of an accident. Suppose that a submarine has eight compartments. Chances of failure of all compartments because of high outside pressure can be described by an eight-dimensional CDF accounting for failures of individual compartments (which are not independent because of common-cause factors: similar compartment design, outside water pressure, etc.). Another example is to ensure an uninterrupted power supply from a system of redundant emergency generators in case of a natural disaster. All generators may be impacted by the same factors and failures of individual generators are not independent. Yet another example is to evaluate the probability that several escape roads will be blocked simultaneously because of common factors such as a hurricane. In all of these examples, the probability of an event should be assessed for a random vector rather than an r.v. From optimization perspective, one of the technical difficulties that engineers face when using a multivariate CDF is that for any fixed \(\textbf{x}\in {{\mathbb {R}}}^n\), the CDF \(F_\textbf{X}(\textbf{x})\) of a random vector \(\textbf{X}\) is not a convex function of \(\textbf{X}\). Moreover, CDFs based on observations (i.e., discrete distributions) are discontinuous piecewise constant functions of \(\textbf{x}\). In this case, stochastic optimization problems result in mixed-integer optimization problems, which are notoriously hard to solve for a large number of scenarios. Also, the use of CDF in decision problems has two conceptual shortcomings. First one is that at any fixed point x, the CDF of an r.v. X does not “capture” the extent of X below x, even if a tail has a small weight, the average value of X in the tail could be quite significant. As a result, reliance just on CDF may yield solutions with undesirable tail outcomes occurring with small probabilities. The other conceptual shortcoming is that in general, a safety first principle based on the CDF of X does not agree with a common perception that aggregation reduces volatility.Footnote 1 This perception can be “captured” by the notion of convex order. Informally, an r.v. \( X \) dominates an r.v. \( Y \) in convex order, and we write \(X \geq _{cx} Y\), if the distribution of \( Y \) can be obtained from the distribution of \( X \) by aggregation. A formal definition is that \(X \geq _{cx} Y\) if \({{\mathbb {E}}}[g(X)] \geq {{\mathbb {E}}}[g(Y)]\) for any convex function \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\). \(X \geq _{cx} Y\) implies that \({{\mathbb {E}}}[X]={{\mathbb {E}}}[Y]\) and \(\sigma (X) \geq \sigma (Y)\), where \(\sigma (X)\) is standard deviation,—this supports the perception that Y, being aggregated from X and having same expected value as X, is less volatile than X. A functional f is called Schur-convex if \(X \geqslant _{cx} Y\) implies that \(f(X) \geqslant f(Y)\). In other words, a Schur-convex functional does not increase after data aggregation. Examples of such functionals include \(f(X)=\sigma (X)\) and \(f(X)=-{{\mathbb {E}}}[u(X)]\) for a concave function u. However, \(f(X)=F_X(x)={\mathbb {P}}[X\leqslant x]\) with fixed x is not Schur-convex. As a result, if \(X^*\in \mathop {\mathrm{arg\,min}}_{X\in {{{\mathcal {X}}}}} F_X(x)\) for a feasible set \({{{\mathcal {X}}}}\) and \(x \in {\mathbb {R}}\), there can exist \(Y\in {{{\mathcal {X}}}}\) such that \(X^* \geqslant _{cx} Y\) and \(F_{X^*}(x)< F_Y(x)\). In this case, \({{\mathbb {E}}}[Y]={{\mathbb {E}}}[X^*]\) and \(\sigma (Y) \leqslant \sigma (X^*)\), but Y is “less safe” than \(X^*\).

In the one dimensional case, i.e., for an r.v. X, all these CDF deficiencies were addressed with the functions of superquantile and buffered probability of exceedance (bPOE) [6]. Superquantile, which is also known as expected shortfall and conditional value-at-risk (CVaR) [14, 15] of X with confidence level \(\alpha\), is the expectation of the right \((1-\alpha )\)—tail of the distribution of X, i.e., it is the average of the largest outcomes with total probability \(1-\alpha\). Superquantile can be defined as follows

$$\begin{aligned} {\overline{q}}_X(\alpha )=\frac{1}{1-\alpha }\int _\alpha ^1 q_X(s)ds\equiv \min _{c\in {\mathbb {R}}}\left( c+\frac{1}{1-\alpha } {\mathbb {E}}[X-c]^+\right) ,\qquad \alpha \in [0,1), \end{aligned}$$

where \(q_X(s) = \inf \{x \in {\mathbb {R}}\,|\,F_X(x)>s\}\) is the quantile function and \(z^+\) denotes \(\max \{0,z\}\). bPOE is an extension of the buffered probability of failure [13] and is equal to 1 minus the inverse of superquantile, see [6],

$$\begin{aligned} {\overline{p}}_X(x) = 1-{\overline{q}}^{\;-1}_X (x)\;\; \text {for} \;\; x \ne \sup X. \end{aligned}$$

It can also be evaluated by

$$\begin{aligned} {\overline{p}}_X(x) = \min _{a\geqslant 0} {\mathbb {E}}[a(X-x) + 1]^+ \;\; \text {for} \;\; x \ne \sup X, \end{aligned}$$
(1)

see [6, 7]. Considered as a function of r.v. X for fixed x, bPOE is the minimal quasi-concave upper bound (see [6]) of probability of exceedance (POE),

$$\begin{aligned} p_X(x)={\mathbb {P}}[ X \geqslant x] \;\;\text {for} \;\;x \ne \sup X. \end{aligned}$$

Superquantile and bPOE constraints are equivalent, see [6]:

$$\begin{aligned} {\overline{p}}_X(x) \leqslant 1-\alpha \quad \Longleftrightarrow \quad {\overline{q}}_{X}(\alpha ) \leqslant x \;\;\text {for} \;\; x \ne \sup X, \;\; \alpha \in [0,1). \end{aligned}$$
(2)

The aim of this work is to construct extensions of multivariate CDF, defined for the lower tail rather than the upper tail. The analogue of superquantile for the lower tail will be called subquantile:

$$\begin{aligned} {\underline{q}}_X(\alpha )=\frac{1}{\alpha }\int _0^\alpha q_X(s)ds\equiv \max _{c\in {\mathbb {R}}}\left( c-\frac{1}{\alpha }{\mathbb {E}}[c-X]^+\right) ,\quad \alpha \in (0,1]. \end{aligned}$$
(3)

To define an analog of bPOE for the lower tail, let an r.v. X with the CDF \(F_X:{\mathbb {R}}\rightarrow [0,1]\) represent a reward of some kind, e.g., money, energy, satisfied demand, etc. Then for a threshold \(\alpha \in (0,1]\), buffered CDF (bCDF) is defined as an inverse of subquantile, i.e., \({\overline{F}}_X(x)=\alpha\), where \(\alpha\) is a solution of the equation \(x={\underline{q}}_X(\alpha )\). In other words,

$$\begin{aligned} {\overline{F}}_X(x)=\alpha \quad \Longleftrightarrow \quad x={\underline{q}}_X(\alpha )\;\;\text {for} \;\; x \ne \inf X, \;\; \alpha \in (0,1]. \end{aligned}$$
(4)

For every fixed x, bCDF \({\overline{F}}_X(x)\) is a quasi-convex function of r.v. X and is the minimal quasi-convex upper bound of \(F_X(x)\). This follows from the fact that bPOE is the minimal quasi-convex upper bound of POE [6]; see also [18, Eq’s (3.2.6)–(3.2.8)] and [2, Eq (30)]. bCDF admits a representation similar to (1):

$$\begin{aligned} {\overline{F}}_X (x) = {\overline{p}}_{-X}(-x)=\min _{a\geqslant 0} {\mathbb {E}}[a(x-X) + 1]^+. \end{aligned}$$
(5)

Similar to the constraint equivalence (2) for the upper tail, (4) yields constraint equivalence for the lower tail:

$$\begin{aligned} {\overline{F}}_X(x)\leqslant \alpha \quad \Longleftrightarrow \quad {\underline{q}}_X(\alpha )\geqslant x\quad \Longleftrightarrow \quad \max _{c\in {\mathbb {R}}}\left( c-\frac{1}{\alpha }{\mathbb {E}}[c-X]^+\right) \geqslant x\;\;\text {for} \;\; x \ne \inf X, \;\; \alpha \in (0,1]. \end{aligned}$$

Under certain conditions, the maximum in the last constraint can be dropped, and the constraint takes the form \(c-\alpha ^{-1}{\mathbb {E}}[c-X]^+\geqslant x\), which can be reformulated as linear constraints for discrete distributions. However, the main advantage of bCDF compared to CDF is that bCDF describes the distribution tails. For instance, when r.v. \(X\) represents a portfolio value and depends on the portfolio weights (decision variables), minimization of \({\overline{F}}_X(x)\) yields an optimal solution \(X^*\) with \({\overline{F}}_{X^*}(x)=\alpha ^*\) such that the average value of \(X^*\) in \(\alpha ^*\cdot 100\%\) of the worst outcomes is at least x. Under some mild conditions, optimization of one-dimensional bCDF can be reduced to convex and linear programming (for discrete distributions). For example, let \({{\textbf{X}}}= (X_1,\dots ,X_m)\) be a known random vector, \(x\in {\mathbb {R}}\) a given threshold, \({{\textbf{w}}}=( w_1,\dots ,w_m) \in {\mathbb {R}}^m\) an unknown vector, and \(X={{\textbf{w}}}^{\! \top } {{\textbf{X}}}= w_1 X_1 +\cdots + w_m X_m\). Then, the problem of minimizing \({\overline{F}}_X(x)\) with respect to \({{\textbf{w}}}\) can be formulated with (5) as

$$\begin{aligned} \min _{{{\textbf{w}}}} {\overline{F}}_X(x) = \min _{{{\textbf{w}}}} \min _{a\geqslant 0}{\mathbb {E}}[a(x - {{\textbf{w}}}^{\! \top } {{\textbf{X}}})+1]^+=\min _{{\textbf{c}}, \,a\geqslant 0}{\mathbb {E}}[a\,x - {\textbf{c}} ^{\! \top } {{\textbf{X}}}+1]^+, \end{aligned}$$

where \({\textbf{c}} =a\,{{\textbf{w}}}\) is a new decision vector. The last optimization problem can be reduced to convex and linear programming for a discrete distribution as in [6]. This also follows from a simple observation that

$$\begin{aligned} {\overline{F}}_X(x) = {\overline{p}}_{x - {{\textbf{w}}}^{\! \top } {{\textbf{X}}}}(0). \end{aligned}$$
(6)

Rockafellar and Royset [12] introduced super CDF (sCDF) as the inverse of superquantile, i.e., \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})={\overline{q}}^{\,-1}_X (x) = 1- {\overline{p}}_X(x)\), which can be shown to be the maximal quasi-concave lower bound of CDF. It follows from (1) that

$$\begin{aligned} {\underline{F}}_{X}(x) = 1- {\overline{p}}_X(x) =1-\min _{a\geqslant 0} {\mathbb {E}}[a(X-x) + 1]^+ = \max _{a \geqslant 0} {\mathbb {E}}[\min \{a(x-X),1\}]. \end{aligned}$$
(7)

We will call super CDF by reduced CDF (rCDF) to emphasize that this function is a lower bound of CDF, versus buffered CDF which is an upper bound of CDF.

It follows from (5) and (7) that bCDF and rCDF are related by

$$\begin{aligned} {\underline{F}}_{X}(x) = 1-{\overline{F}}_{-X}(-x),\qquad {\overline{F}}_{X}(x) = 1-{\underline{F}}_{-X}(-x). \end{aligned}$$
(8)

Generalizing bCDF and rCDF for random vectors is not straightforward. The difficulty is that in the multivariate case, the notions of quantile and its inverse are not well-defined. A potential insight can be offered through exploring the properties of (5) and (7). This work generalizes bCDF and rCDF given by (5) and (7) to random vectors. In contrast to the one-dimensional case, the multivariate bCDF and rCDF are not quasi-convex and quasi-concave functions of a random vector. We show that they are the minimal Schur-convex upper bound and the maximal Schur-concave lower bound of the multivariate CDF, respectively, and that their special structure allows us constructing an algorithm, which can quickly find their local extrema. We demonstrate efficiency of the algorithm in a case study on optimization of a collateralized debt obligation with bCDF functions in constraints.

The paper is organized into five sections and five appendices. Section 2 introduces bCDF and rCDF and shows that they are upper and lower bounds for a multivariate CDF. Section 3 considers bCDF and rCDF in optimization problems. Section 4 presents applications of bCDF and rCDF. Section 5 discusses the case study. Appendices A, B and C present proofs of three propositions and Appendices D and E summarize algorithms for solving optimization problems with bCDF and rCDF in objective function and constraints.

2 The lower and upper bounds for multivariate CDF

Let \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) be a probability space, where \(\Omega\) is an arbitrary non-empty set, \({{{\mathcal {F}}}}\) is the \(\sigma\)-algebra of subsets of \(\Omega\), and \({\mathbb {P}}\) is a probability measure on \((\Omega ,{{{\mathcal {F}}}})\). Sets in \({{{\mathcal {F}}}}\) are called events. A probability space \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) is called atomless if there exists an r.v. on it with a continuous CDF.

An n-dimensional random vector \({{\textbf{X}}}=(X_1, \dots , X_n)\) is a function \({{\textbf{X}}}:\Omega \rightarrow {\mathbb {R}}^n\) such that for every \({{\textbf{x}}}\in {\mathbb {R}}^n\) set \(\{\omega \in \Omega \,|\,{{\textbf{X}}}(\omega ) \leqslant {{\textbf{x}}}\}\) is an event,Footnote 2 and \(X_1, \dots , X_n\) are called r.v.’s. On an atomless probability space, there exists a collection of random vectors with any given joint distribution. Let \(L^{1,n}(\Omega )\) and \(L^{\infty ,n}(\Omega )\) denote the sets of n-dimensional random vectors \({{\textbf{X}}}\) for which \({{\mathbb {E}}}[{{\textbf{X}}}]=\int _\Omega {{\textbf{X}}}(\omega )d{\mathbb {P}}\) and \(\max\limits_i \sup |{{X_i}}|\) exist and are finite, respectively.

Definition 1

A random vector \({{\textbf{X}}}\in L^{1,n}(\Omega )\) dominates random vector \({{\textbf{Y}}}\in L^{1,n}(\Omega )\) in convex order, and we write \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\), if \({{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[g({{\textbf{Y}}})]\) for any convex function \(g:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\).

The convexity of g implies existence of \({{\textbf{a}}}\in {\mathbb {R}}^n\) and \(b \in {\mathbb {R}}\) such that \(g(x) \geqslant {{\textbf{a}}}^\top x + b\) for all \({{\textbf{x}}}\in {\mathbb {R}}^n\), so that \({{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ b] = {{\textbf{a}}}^\top {{\mathbb {E}}}[{{\textbf{X}}}] + b > -\infty\) is well-defined for every \({{\textbf{X}}}\in L^{1,n}(\Omega )\), although it can be \(+\infty\).

Definition 2

A function \(f:L^{1,n}(\Omega ) \rightarrow {{\mathbb {R}}}\) is Schur-convex if \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\) implies \(f({{\textbf{X}}}) \geqslant f({{\textbf{Y}}})\). A function f is Schur-concave if \(-f\) is Schur-convex.

Definition 2 implies that any function in the form \(f({{\textbf{X}}})={{\mathbb {E}}}[g({{\textbf{X}}})]\) is Schur-convex and Schur-concave if g is convex and concave, respectively. Any function f has the unique minimal Schur-convex upper bound given by

$$\begin{aligned} f_u(\textbf{X}) = \sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}} f({{\textbf{Y}}}), \end{aligned}$$

see [6], as well as the the unique maximal Schur-concave lower bound

$$\begin{aligned} f_l(\textbf{X}) = \inf _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}} f({{\textbf{Y}}}). \end{aligned}$$

Let

$$\begin{aligned} F_{{\textbf{X}}}({{\textbf{x}}})={\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] \end{aligned}$$

denote multivariate CDF of \({{\textbf{X}}}\).

Definition 3

For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\), buffered CDF (bCDF) and reduced CDF (rCDF) are defined by

$$\begin{aligned}{} & {} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}\left[ {{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1\right] ^+, \end{aligned}$$
(9)
$$\begin{aligned}{} & {} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}], \end{aligned}$$
(10)

respectively, where \({{\textbf{a}}}=(a_1, \dots , a_n)\), \({{\textbf{a}}}^{\! \top } {{\textbf{X}}}=\sum _{i=1}^n a_i X_i\), and \({\mathbb {R}}^n_+=\{{{\textbf{a}}}\in {\mathbb {R}}^n, \, {{\textbf{a}}}\geqslant 0\}\).

Remark

With an auxiliary variable \(a_0 \geqslant 0\) and the following set (simplex)

$$\begin{aligned} {{\textbf{A}}}=\left\{ {{\textbf{a}}}\in {\mathbb {R}}^n_+ \,\left| \, \sum _{i=1}^n a_i=n \right. \right\} , \end{aligned}$$
(11)

bCDF and rCDF, defined by (9) and (10), respectively, can be recast as

$$\begin{aligned}{} & {} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{{{\textbf{a}}}\in {{\textbf{A}}},\; a_0 \in {\mathbb {R}}_+} {\mathbb {E}}\left[ a_0 \,{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1\right] ^+ = \inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\overline{p}}_{{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}})}(0), \end{aligned}$$
(12)
$$\begin{aligned}{} & {} \begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})&= 1-\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}\left[ 1+\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}\right] ^+\\&= 1-\inf _{a_0 \geqslant 0} \inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\mathbb {E}}\left[ 1+a_0\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}\right] ^+ \\&= 1-\inf _{{{\textbf{a}}}\in {{\textbf{A}}}} {\overline{p}}_{\max \{a_1 (X_1-x_1),\dots ,a_n (X_n-x_n)\}}(0), \end{aligned} \end{aligned}$$
(13)

provided that infimums in (12) and (13) with respect to \(a_0\geqslant 0\) are attained. Representations (12) and (13) show that bCDF and rCDF of a random vector \({{\textbf{X}}}\) are, in fact, bPOE of corresponding r.v.’s and minimized with respect to the vector \({{\textbf{a}}}\) over the simplex \({{\textbf{A}}}\). These representations are important in devising efficient algorithms for solving optimization problems with bCDF and rCDF. If \({{\textbf{X}}}\) depends on decision variables, optimal decision problems involving bCDF and rCDF in either objective function or constraints are bilevel optimization, see Sect. 3.

First we show that bCDF and rCDF are the upper and lower bounds for CDF, respectively.

Proposition 2.1

For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\) and any \({{\textbf{x}}}\in {\mathbb {R}}^n\),

$$\begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) \leqslant {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}] \leqslant {\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] = F_{{{\textbf{X}}}}({{\textbf{x}}}) \leqslant {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}). \end{aligned}$$
(14)

Proof

Let \({{\textbf{a}}}\in {\mathbb {R}}^n_+\) be arbitrary, and let

$$\begin{aligned} A = \{w \in \Omega : \, {{\textbf{X}}}(\omega )< {{\textbf{x}}}\} \end{aligned}$$

be the event that \({{\textbf{X}}}(\omega )< {{\textbf{x}}}\). Then for any \(\omega \not \in A\) we have \(X_i(\omega )\geqslant x_i\) for some i, so that

$$\begin{aligned} \min \{1, a_1 (x_1-X_1(\omega )),\dots , a_n (x_n-X_n(\omega ))\} \leqslant a_i(x_i-X_i(\omega )) \leqslant 0 = I_A(\omega ), \end{aligned}$$

where \(I_A\) is the indicator indicator functionFootnote 3 of the event A. Also, for \(\omega \in A\),

$$\begin{aligned} \min \{1, a_1 (x_1-X_1(\omega )),\dots , a_n (x_n-X_n(\omega ))\} \leqslant 1 = I_A(\omega ), \end{aligned}$$

and consequently,

$$\begin{aligned} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}] \leqslant {\mathbb {E}}[I_A] = {\mathbb {P}}[A] = {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]. \end{aligned}$$

Since \(a \in {\mathbb {R}}^n_+\) is arbitrary, the last inequality and (10) yield the first inequality in (14). Similarly, if

$$\begin{aligned} B = \{w \in \Omega : \, {{\textbf{X}}}(\omega )\leqslant {{\textbf{x}}}\} \end{aligned}$$

is the event that \({{\textbf{X}}}(\omega )\leqslant {{\textbf{x}}}\), then for any \(\omega \in B\),

$$\begin{aligned}{}[{{\textbf{a}}}^\top ({{\textbf{x}}}- {{\textbf{X}}}(\omega )) + 1]^+ \geqslant [0+1]^+ = 1 = I_B(\omega ), \end{aligned}$$

and for any \(\omega \not \in B\),

$$\begin{aligned}{}[{{\textbf{a}}}^\top ({{\textbf{x}}}- {{\textbf{X}}}(\omega )) + 1]^+ \geqslant 0 = I_B(\omega ), \end{aligned}$$

so that

$$\begin{aligned} {\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{x}}}-{{\textbf{X}}}) + 1]^+ \geqslant E[I_B] = {\mathbb {P}}[B] = {\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{x}}}] = F_{{{\textbf{X}}}}({{\textbf{x}}}), \end{aligned}$$

which along with (9) yields the last inequality in (14). \(\square\)

It will be shown later that bCDF and rCDF are the minimal Schur-convex upper bounds of \(F_{{{\textbf{X}}}}({{\textbf{x}}})\) and the maximal Schur-concave lower bounds of \({\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]\), respectively.

For an r.v. X,

$$\begin{aligned} {\overline{F}}_{X}(x)={\overline{p}}_{-X}(-x) \quad \text {and} \quad {\underline{F}}_{X}(x) = 1-{\overline{p}}_{X}(x), \end{aligned}$$

where \({\overline{p}}_X(x)\) is the (one-dimensional) bPOE defined in (1).

Also, note that

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})&= 1-\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\min \{1, a_1 (x_1-X_1),\dots , a_n (x_n-X_n)\}] \end{aligned}$$
(15a)
$$\begin{aligned}&= \inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[\max \{a_1 (X_1-x_1),\dots , a_n (X_n-x_n)\} + 1]^+ \end{aligned}$$
(15b)
$$\begin{aligned}&\leqslant \inf _{a_0 \geqslant 0} {\mathbb {E}}[a_0 \max \{X_1 -x_1, \dots , X_n-x_n\} + 1]^+ \end{aligned}$$
(15c)
$$\begin{aligned}&= {\overline{p}}_{\max \{X_1-x_1, \dots , X_n-x_n\}}(0). \end{aligned}$$
(15d)

Note that \(F_{{\textbf{X}}}(\textbf{x})\) can be represented by

$$\begin{aligned} F_{{\textbf{X}}}(\textbf{x})={\mathbb {P}}[{{\textbf{X}}}\leqslant \textbf{x}] = \sup _{W \in \{0,1\}} {\mathbb {E}}[W]\quad \text {subject to}\quad X_i W \leqslant x_i W,\quad i=1,\dots ,n, \end{aligned}$$
(16)

where “\(W \in \{0,1\}\)” indicates that the supremum is over all random variables such that \(W(\omega )\in \{0,1\}\) for every \(\omega \in \Omega\), while the constraint \(X_i W \leqslant x_i W\) means that \(X_i(\omega ) W(\omega ) \leqslant x_i W(\omega )\) for all \(\omega \in \Omega\). The supremum in (16) is attained when W is the indicator function of the event \({{\textbf{X}}}\leqslant \textbf{x}\). Similarly, bCDF has the following interpretation through dual characterization.

Proposition 2.2

For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\) and any \({{\textbf{x}}}\in {\mathbb {R}}^n\),

$$\begin{aligned} {\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) = \sup _{W \in L^{1}(\Omega )} \, {\mathbb {E}}[W]\quad \text {subject to}\quad {\mathbb {E}}[X_i W] \leqslant x_i\, {\mathbb {E}}[W], \quad i=1,\dots ,n, \quad W\in [0,1], \end{aligned}$$

where \(W\in [0,1]\) means that \(0 \leqslant W(\omega ) \leqslant 1\) for all \(\omega \in \Omega\).

Proof

See Appendix A. \(\square\)

Proposition 2.3

If the underlying probability space \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) is atomless, then \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is the minimal Schur-convex upper bound for \(F_{{{\textbf{X}}}}({{\textbf{x}}})\).

Proof

See Appendix B. \(\square\)

When \(n=1\), \({\overline{F}}_X(x)\) is also a unique minimal quasi-convex upper bound for \(F_X(x)\), see [6]. This is not true when \(n\geqslant 2\).

Proposition 2.4

In general (when \(n\geqslant 2\)), \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is not quasi-convex in \({{\textbf{X}}}\).

Proof

Let \(n\geqslant 2\), \({{\textbf{0}}}=(0,0,\dots ,0)\), \({{\textbf{x}}}_0=(1,-1,0,\dots ,0)\in {\mathbb {R}}^n\) and \({{\textbf{X}}}\) be such that \({\mathbb {P}}[{{\textbf{X}}}={{\textbf{x}}}_0]=1\). Then \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})=\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{0}}}-{{\textbf{X}}}) + 1]^+ = \min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} [-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+ = 0\), where the last equality follows from \([-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+\geqslant 0\) and that \([-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0\) for \({{\textbf{a}}}=(1,0,\dots ,0)\). Similarly, for \({{\textbf{a}}}=(0,1,0\dots ,0)\), it is valid \([{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0\), which yields \({\overline{F}}_{-{{\textbf{X}}}}({{\textbf{0}}})=0\). However, \({\overline{F}}_{({{\textbf{X}}}+(-{{\textbf{X}}}))/2}(0)={\overline{F}}_{{{\textbf{0}}}}({{\textbf{0}}})=1>0\). Consequently, \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is not quasi-convex. \(\square\)

Remark

The proof of Proposition 2.4 uses constant random vectors. Suppose that \(n=2\) and that \({{\textbf{X}}}\) is a constant r.v., i.e., \({\mathbb {P}}[{{\textbf{X}}}=(x_1,x_2)]=1\). Let

$$\begin{aligned} g(x_1,x_2) = {\overline{F}}_{{{\textbf{X}}}}({{\textbf{0}}}) = \inf _{(a_1,a_2) \in {\mathbb {R}}^2_+} {\mathbb {E}}[(a_1,a_2)^\top (-{{\textbf{X}}}) + 1]^+ = \inf _{(a_1,a_2) \in {\mathbb {R}}^2_+} [a_1(-x_1)+a_2(-x_2) + 1]^+. \end{aligned}$$

If \(x_1 \leqslant 0\) and \(x_2 \leqslant 0\), then \(a_1(-x_1)+a_2(-x_2) + 1 \geqslant 1\), so that \(g(x_1,x_2)=1\). If \(x_1>0\) then for \(a_1=1/x_1\) and \(a_2=0\), we have \(a_1(-x_1)+a_2(-x_2) + 1 = 0\), and consequently, \(g(x_1,x_2)=0\). Similarly, if \(x_2>0\) then \(g(x_1,x_2)=0\). Thus, \(g(x_1,x_2)=1\) when \(x_1\leqslant 0\) and \(x_2 \leqslant 0\), and \(g(x_1,x_2)=0\) otherwise. This function is not quasi-convex and cannot be made quasi-convex by altering it values at a finite number of points.

Now let \(I=\{\left. (a'_1, a'_2)\right| a'_1 \geqslant 0, \, a'_2 \geqslant 0, \, a'_1 + a'_2 = 1\}\) be the interval with endpoints (1, 0) and (0, 1). Then \(g(x_1,x_2)\) can be represented as

$$\begin{aligned} g(x_1,x_2) = \inf _{(a'_1,a'_2) \in I} \inf _{a_0 \geqslant 0} [a_0(-a'_1x_1-a'_2x_2) + 1]^+ = \inf _{(a'_1,a'_2) \in I} h_{a'_1,a'_2}(x_1,x_2), \end{aligned}$$

where

$$\begin{aligned} h_{a'_1,a'_2}(x_1,x_2) = \inf _{a_0 \geqslant 0} [a_0(-a'_1x_1-a'_2x_2) + 1]^+. \end{aligned}$$

Observe that \(h_{a'_1,a'_2}(x_1,x_2)=1\) when \(a'_1x_1 + a'_2x_2 \leqslant 0\) and \(h_{a'_1,a'_2}(x_1,x_2)=0\) otherwise, and that \(h_{a'_1,a'_2}(x_1,x_2)\) is quasi-convex. However, the infimum of such functions over \((a'_1,a'_2) \in I\) is not quasi-convex. As \((a'_1,a'_2)\) moves along I from (1, 0) to (0, 1), the line \(a'_1x_1 + a'_2x_2\) rotates from the x-axis to the y-axis. As a result, \(g(x_1,x_2) = 1\) if and only if the point \((x_1,x_2)\) remains “under” the rotating line, which occurs if and only if \(x_1 \leqslant 0\) and \(x_2 \leqslant 0\).

Note that \(g(x_1,x_2)\) is the infimum of quasi-convex functions \(h_{a'_1,a'_2}(x_1,x_2)\), but \(g(x_1,x_2)\) is not “closely approximated” by any of these functions. Also, every function \(h_{a'_1,a'_2}(x_1,x_2)\) is a quasi-convex upper bound for \(g(x_1,x_2)\). However, none of these quasi-convex upper bounds is “best” or “minimal”. So, \(g(x_1,x_2)\) has many quasi-convex upper bounds, but the unique “minimal” quasi-convex upper bound does not exist.

In fact, \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is also not quasi-concave in \({{\textbf{X}}}\), even for \(n=2\), see [9]. It can be interpreted through dual characterization in a way similar to that of bCDF. Indeed,

$$\begin{aligned} \begin{aligned} 1 - F_{{\textbf{X}}}(\textbf{x})&= {\mathbb {P}}[X_1> x_1 \text { or } \ldots \text { or } X_n> x_n]\\&=\sup _{W_i \in \{0,1\}} \sum _{i=1}^n {\mathbb {E}}[W_i]\quad \text {subject to}\quad X_i W_i > x_i W_i,\quad i=1,\dots ,n,\quad \sum _{i=1}^n W_i \leqslant 1. \end{aligned} \end{aligned}$$
(17)

Proposition 2.5

Define

$$\begin{aligned} \begin{aligned} {\underline{F}}^D_{{{\textbf{X}}}}({{\textbf{x}}}) = \inf _{{{\textbf{W}}}\in L^{\infty ,n}(\Omega )} \,&\left( 1-\sum _{i=1}^n {\mathbb {E}}[W_i]\right) \\ \text {subject to}&\quad {\mathbb {E}}[X_i W_i] \geqslant x_i\,{\mathbb {E}}[W_i],\;\; W_i\in [0,1],\quad i=1,\dots ,n,\;\; \sum _{i=1}^n W_i\in [0,1]. \end{aligned} \end{aligned}$$

Also, let

$$\begin{aligned} {\underline{F}}^L_{{{\textbf{X}}}}({{\textbf{x}}})=\inf _{Y \leqslant _{cx} X} {\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}] \end{aligned}$$

be the the maximal Schur-concave lower bound for \({\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]\). Then on an atomless probability space,

$$\begin{aligned} {\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}}) = {\underline{F}}^D_{{{\textbf{X}}}}({{\textbf{x}}}) = {\underline{F}}^L_{{{\textbf{X}}}}({{\textbf{x}}}). \end{aligned}$$

Proof

See Appendix C. \(\square\)

For illustration of these results, let r.v.’s \(X_1\) and \(X_2\) model profits from two different investments and let an investor evaluate the probability of both \(X_1\) and \(X_2\) to be non-positive. Let \({{\textbf{X}}}=(X_1,X_2)\), \({{\textbf{0}}}=(0,0)\), and let \(F_{{{\textbf{X}}}}({{\textbf{0}}})={\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{0}}}]\). Let \({{\textbf{X}}}\) assume values \((-1,-1)\), (1, 1) with probabilities 1/3 and 2/3, respectively. Then \(F_{{{\textbf{X}}}}({{\textbf{0}}})=1/3\). Suppose X takes values \((-1,-1)\), (1, 1), and (1, 1) in the events \(\omega _1\), \(\omega _2\) and \(\omega _3\) with probabilities \({\mathbb {P}}[\omega _1]={\mathbb {P}}[\omega _2]={\mathbb {P}}[\omega _3]=1/3\). Let \({{{\mathcal {F}}}}=\{A,B\}\) be a partition of the probability space by the events \(A=\{\omega _1,\omega _2\}\) and \(B=\{\omega _3\}\). Then \({{\textbf{Y}}}={\mathbb {E}}[{{\textbf{X}}}|{{{\mathcal {F}}}}]\) is a random vector equal to (0, 0) and (1, 1) with probabilities 2/3 and 1/3, respectively. Note that Y is “averaged” version of X. However, we have \(F_{{{\textbf{Y}}}}({{\textbf{0}}})=2/3 > 1/3 = F_{{{\textbf{Y}}}}({{\textbf{0}}})\). This means that \(F_{{{\textbf{X}}}}({{\textbf{0}}})\) is not Schur-convex as a function of \({{\textbf{X}}}\). Instead, we suggest to use its Schur-convex upper bound given in Proposition 2.5—such bound decreases when \({{\textbf{X}}}\) is replaced by its average-out version.

This example also demonstrates convenience of modeling the probability space as atomless even for discrete random variables. If the original random vector \({{\textbf{X}}}\) were defined on a discrete probability space \(\Omega =\{\omega _1,\omega _2\}\) with \({\mathbb {P}}[\omega _1]=1/3\) and \({\mathbb {P}}[\omega _2]=2/3\), the “averaged-out” version \({{\textbf{Y}}}\) of \({{\textbf{X}}}\) would not exist. For this \(\Omega\),

$$\begin{aligned} \sup _{Y \leqslant _{cx} X} F_{{{\textbf{Y}}}}({{\textbf{0}}}) = \frac{1}{3} \ne \frac{2}{3}, \end{aligned}$$

which illustrates that Proposition 2.3 fails on discrete probability spaces.

3 Buffered and reduced CDFs in optimization problems

Let \({{\textbf{X}}}\) be a random vector, \({{\textbf{x}}}\in {\mathbb {R}}^n\), and let \(\textbf{f}({{\textbf{w}}};{{\textbf{X}}})= (f_1({{\textbf{w}}};{{\textbf{X}}}),\ldots , f_n({{\textbf{w}}};{{\textbf{X}}}))\) be a vector of concave functions of \({{\textbf{w}}}\) and \({\textbf{h}}({{\textbf{w}}};{{\textbf{X}}})= (h_1({{\textbf{w}}};{{\textbf{X}}}),\ldots ,h_n({{\textbf{w}}};{{\textbf{X}}}))\) be a vector of convex functions of \({{\textbf{w}}}\), defined on a convex set \({{\textbf{W}}}\subseteq {\mathbb {R}}^k\). Let also \({\mathbb {E}}[|f_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty\), \({\mathbb {E}}[|h_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty\) for all \({{\textbf{w}}}\in {{\textbf{W}}}\) and \(i\in \{1,\dots ,n\}\).

If all \(f_i\) in the definition of \({\textbf{f}}({{\textbf{w}}};{{\textbf{X}}})\) are convex, and \({{{\mathcal {W}}}}= {{{\mathcal {W}}}}_1 \times \cdots \times {{{\mathcal {W}}}}_n\) with all \({{{\mathcal {W}}}}_1,\dots ,{{{\mathcal {W}}}}_n\) being convex, then the problem \(\min _\mathbf{w \in {{{\mathcal {W}}}}} {\overline{F}}_\mathbf{f(w;X)}(\textbf{x})\) is convex.

3.1 bCDF and rCDFs in objective

This section considers the following two optimization problems

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \end{aligned}$$
(18)

and

$$\begin{aligned} \max _{{{\textbf{w}}}\in {{\textbf{W}}}}{\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}). \end{aligned}$$
(19)

With functions \(G_{\! F}({{\textbf{a}}}, {{\textbf{w}}})\) and \(G_{\! H}({{\textbf{a}}},{{\textbf{w}}})\) defined by

$$\begin{aligned} G_{\! F}({{\textbf{a}}}, {{\textbf{w}}}) ={\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) + 1]^+ \end{aligned}$$

and

$$\begin{aligned} G_{\! H}({{\textbf{a}}},{{\textbf{w}}})= -{\mathbb {E}}[\min \{1, a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}], \end{aligned}$$

bCDF and rCDF can be recast as

$$\begin{aligned} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\inf _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} G_{\! F}({{\textbf{a}}},{{\textbf{w}}}) \end{aligned}$$
(20)

and

$$\begin{aligned} {\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\sup _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} - G_{\! H}({{\textbf{a}}},{{\textbf{w}}}). \end{aligned}$$
(21)

It is assumed that in (20) and (21), the infimum and supremum with respect to \({{\textbf{a}}}\) can be replaced by the minimum and maximum, respectively.Footnote 4 With this assumption, optimization problems (18) and (19) along with (20) and (21) take the form

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) =\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! F}({{\textbf{a}}},{{\textbf{w}}}) \end{aligned}$$
(22)

and

$$\begin{aligned} \max _{{{\textbf{w}}}\in {{\textbf{W}}}}{\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}})=\max _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} -G_{\! H}({{\textbf{a}}},{{\textbf{w}}})=-\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! H}({{\textbf{a}}},{{\textbf{w}}}), \end{aligned}$$
(23)

respectively, and (22) and (23) can be written as

$$\begin{aligned} \min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+,\;{{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}({{\textbf{a}}},{{\textbf{w}}}), \quad I\in \{F,H \}. \end{aligned}$$
(24)

Further, with an auxiliary variable \(a_0 \geqslant 0\) and with the set \({{\textbf{A}}}\) defined by (11), problem (24) can be equivalently reformulated in the form

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}},\; a_0 \in {\mathbb {R}}_+,\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}. \end{aligned}$$
(25)

The idea is to solve (25) iteratively by alternating variables \((a_0, {{\textbf{w}}})\) and \({{\textbf{a}}}\): solve (25) with respect to \((a_0, {{\textbf{w}}})\), then with respect to \({{\textbf{a}}}\), and again with respect to \((a_0, {{\textbf{w}}})\), and then again with respect to \({{\textbf{a}}}\), and so on. This method is known as coordinate descent, or, more generally, Block Coordinate Descent (BCD), in which variables are divided into m blocks, and optimization is done iteratively for one block at a time. BCD may not converge to stationary points of a non-convex objective function, which is convex in each block coordinate [10]. However, global convergence was studied under additional assumptions: two-block (\(m=2\)) or strict quasiconvexity for \(m-2\) blocks [3, 4] and uniqueness of minimizer per block [1, Sect. 2.7]. See also [5] for convergence of BCD for nonconvex problems and for relevant references.

Note that in our case the optimization problems

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}, \end{aligned}$$

are convex with respect to \({{\textbf{a}}}\), since the objective is the maximum of linear functions with respect to components of the vector \({{\textbf{a}}}\). On the other hand, the optimization problems

$$\begin{aligned} \min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}) = \min _{ {{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} G_{\! I}(a_0 \,{{\textbf{a}}},{{\textbf{w}}}),\quad I\in \{F,H \}, \end{aligned}$$

can be reduced to convex programming in \({{\textbf{w}}}\), since they are bPOE minimization problems. Indeed,

$$\begin{aligned} \begin{aligned} \min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! F}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}}) = \min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} G_{\! F}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}}) =&\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[a_0\, {{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) + 1]^+ \\ =&\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \,{\overline{p}}_{{{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) } (0). \end{aligned} \end{aligned}$$
(26)

Since all components of the vector function \({\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})\) are concave in \({{\textbf{w}}}\), the function \({{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}}))\) is convex in \({{\textbf{w}}}\). Consequently, minimization of bPOE with respect to \({{\textbf{w}}}\) for this function can be reduced to convex programming, see [6]. Also, (15a) and (15b) imply that

$$\begin{aligned} \begin{aligned} &\min _{a_0 \in {\mathbb {R}},\; {{\textbf{w}}}\in {{\textbf{W}}}} G_{\! H}(a_0 \,{{\textbf{a}}}, {{\textbf{w}}})\\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \min _{a_0 \in {\mathbb {R}}} -{\mathbb {E}}[\min \{1, a_0 a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_0 a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}]\\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \left( - \max _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[\min \{1, a_0a_1 (x_1-h_1({{\textbf{w}}}; {{\textbf{X}}})),\dots , a_0 a_n (x_n-h_n({{\textbf{w}}}; {{\textbf{X}}}))\}]\right) \\ &=\min _{{{\textbf{w}}}\in {{\textbf{W}}}} \left( \min _{a_0 \in {\mathbb {R}}} {\mathbb {E}}[\max \{ a_0 a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_0 a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}+1]^+-1\right) \\ &=-1+ \min _{{{\textbf{w}}}\in {{\textbf{W}}}} {\overline{p}}_{\max \{ a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}}(0). \end{aligned} \end{aligned}$$
(27)

Since all components of the vector function \({\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})\) are convex in \({{\textbf{w}}}\), and components of the vector \({{\textbf{a}}}\) are nonnegative, the function

$$\begin{aligned} \max \{a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\} \end{aligned}$$

is convex in \({{\textbf{w}}}\). Consequently, minimization of bPOE for this function can be reduced to convex programming, see [6]. Finally, with bPOE functions, problem (25) can be formulated as follows

$$\begin{aligned} \min _{{{\textbf{a}}}\in {{\textbf{A}}},\; {{\textbf{w}}}\in {{\textbf{W}}}} {\overline{p}}_{g_I({{\textbf{a}}},{{\textbf{w}}})}(0),\quad I\in \{F,H \}, \end{aligned}$$
(28)

where

$$\begin{aligned} \begin{aligned} g_F({{\textbf{a}}},{{\textbf{w}}})&={{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})),\\ g_H({{\textbf{a}}},{{\textbf{w}}})&=\max \{ a_1 (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_n (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}. \end{aligned} \end{aligned}$$

The algorithm for solving problem (28) is summarized in Appendix D.

3.2 bCDF and rCDF in constraints

This section considers minimization of an objective function \(V({{\textbf{w}}})\) with respect to variables \({{\textbf{w}}}\in {\mathbb {R}}^k\):

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}} V({{\textbf{w}}})\; \end{aligned}$$
(29)

subject to constraints on buffered and reduced CDFs:

$$\begin{aligned}{} & {} {\overline{F}}_{{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \leqslant {\overline{\gamma }}, \end{aligned}$$
(30)
$$\begin{aligned}{} & {} {\underline{F}}_{{\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})} ({{\textbf{x}}}) \geqslant {\underline{\gamma }}. \end{aligned}$$
(31)

With the functions \(G_{\! F}(a_F \,{{\textbf{a}}}_F,{{\textbf{w}}})\) and \(G_{\! H}(a_H \,{{\textbf{a}}}_H,{{\textbf{w}}})\) introduced in Sect. 3.1 and under the assumption that in (20) and (21), the infimum and supremum with respect to \({{\textbf{a}}}\) can be replaced by the minimum and maximum, respectively, problem (29)–(31) can be reformulated as follows

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}, \; a_I \in {\mathbb {R}}_+,\;{{\textbf{a}}}_I \in {{\textbf{A}}},\, I\in \{F,H \} } V({{\textbf{w}}}) \end{aligned}$$
(32)

subject to

$$\begin{aligned}{} & {} G_{\! F}(a_F \,{{\textbf{a}}}_F,{{\textbf{w}}})\leqslant {\overline{\gamma }}, \end{aligned}$$
(33)
$$\begin{aligned}{} & {} G_{\! H}(a_H \,{{\textbf{a}}}_H,{{\textbf{w}}})\leqslant -{\underline{\gamma }}. \end{aligned}$$
(34)

Relationship (26) implies that the constraint (33) with respect to variables \((a_F, {{\textbf{a}}}_F, {{\textbf{w}}})\) can be replaced by

$$\begin{aligned} {\overline{p}}_{{{\textbf{a}}}^{\! \top }_F \! ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) } (0) \leqslant {\overline{\gamma }} \end{aligned}$$

with respect to variables \(({{\textbf{a}}}_F, {{\textbf{w}}})\). By (2), the last constraint is equivalent to the superquantile constraint:

$$\begin{aligned} {\overline{q}}_{{{\textbf{a}}}_{\! F }^{\! \top } ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) }(1-{\overline{\gamma }}) \leqslant 0. \end{aligned}$$
(35)

Similarly, (27) implies that the constraint (34) with respect to variables \((a_H, {{\textbf{a}}}_H, {{\textbf{w}}})\) can be replaced by

$$\begin{aligned} -1+{\overline{p}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n)\}}(0) \leqslant -{\underline{\gamma }} \end{aligned}$$

with respect to variables \(({{\textbf{a}}}_H, {{\textbf{w}}})\). By (2), the last constraint is equivalent to the superquantile constraint

$$\begin{aligned} {\overline{q}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n) }({\underline{\gamma }}) \leqslant 0. \end{aligned}$$
(36)

Finally, problem (32)–(34) can be recast in the form

$$\begin{aligned} \min _{{{\textbf{w}}}\in {{\textbf{W}}}, \;{{\textbf{a}}}_F \in {{\textbf{A}}},\, {{\textbf{a}}}_H \in {{\textbf{A}}}} V({{\textbf{w}}})\;\;\text { subject to (35) and (36).} \end{aligned}$$
(37)

Problem (37) can be solved iteratively as follows: solve (37) with respect to variables \({{\textbf{w}}}\) with fixed \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), then minimize the subquantile functions in (35) and (36) with respect to \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), respectively, i.e., solve

$$\begin{aligned}{} & {} \min _{{{\textbf{a}}}_F \in {{\textbf{A}}}} {\overline{q}}_{{{\textbf{a}}}_{\! F }^{\! \top } ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})) }(1-{\overline{\gamma }}), \end{aligned}$$
(38)
$$\begin{aligned}{} & {} \min _{{{\textbf{a}}}_H \in {{\textbf{A}}}}{\overline{q}}_{\max \{ a_{H1} (h_1({{\textbf{w}}}; {{\textbf{X}}})-x_1),\dots , a_{Hn} (h_n({{\textbf{w}}}; {{\textbf{X}}})-x_n) }({\underline{\gamma }}), \end{aligned}$$
(39)

then solve (37) again with respect to \({{\textbf{w}}}\) with previously found \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), then solve (38) and (39), and so on.

The described algorithm for solving problem (37) is summarized in Appendix E.

4 Applications

4.1 Two-asset option

A two-asset option is an exotic option whose payoff depends on prices \(p_A(t)\) and \(p_B(t)\) of assets A and B at time t. For example, it may have nonzero payoff if \(p_A(T)\) and \(p_B(T)\) simultaneously exceed some strike prices \(C_A\) and \(C_B\), respectively, at some future time T. In this case, if \(p_A(T)\) and \(p_B(T)\) are assumed to be random variables, the option pays with the probability

$$\begin{aligned} p={\mathbb {P}}[p_A(T) \geqslant C_A, p_B(T) \geqslant C_B]. \end{aligned}$$
(40)

Since in general, \(p_A(T)\) and \(p_B(T)\) are not independent random variables, \({\mathbb {P}}[p_A (T)\geqslant C_A, p_B(T) \geqslant C_B]\not ={\mathbb {P}}[p_A(T) \geqslant C_A] \cdot {\mathbb {P}}[p_B(T) \geqslant C_B]\). If \(p_A(0)\) and \(p_B(0)\) are (known) prices of assets A and B at the current time \(t=0\), then (40) can be recast in terms of adjusted rates of returns \(r_A=(p_A(T)-C_A)/p_A(0)\) and \(r_B=(p_B(T)-C_B)/p_B(0)\):

$$\begin{aligned} p={\mathbb {P}}\left[ r_A \geqslant 0, r_B \geqslant 0\right] . \end{aligned}$$

The joint distribution of \(r_A\) and \(r_B\) can be estimated from the historical data. Let \(r_{A,i}\) and \(r_{B,i}\), \(i=1,\dots , T\), be the adjusted rates of return of assets A and B over the last T periods. Then p can be estimated as the number of times \((r_{A,i},r_{B,i})\geqslant 0\) divided by T. However, this estimate is sensitive to small perturbations in the data. Alternatively, we can define \(X=(-r_A,-r_B)\) and estimate the smallest Schur-convex upper bound (9) for p:

$$\begin{aligned} \begin{aligned} {\overline{F}}_{-X}(0)&=\inf _{a\geqslant 0,\,b \geqslant 0} {\mathbb {E}}[a\,r_A + b\,r_B + 1]^+ \\&= \inf _{a\geqslant 0,\,b \geqslant 0} \frac{1}{T} \sum _{i=1}^T [a\,r_{A,i} + b\,r_{B,i} + 1]^+,\\&={\mathop{\mathop{\min}\limits_{a\geqslant 0, b \geqslant 0,}}\limits_{z_1,\dots ,z_T}}\frac{1}{T} \sum _{i=1}^T z_i\\&\quad \text {subject to}\quad z_i\geqslant a\,r_{A,i} + b\,r_{B,i} + 1,\quad z_i\geqslant 0,\quad i=1,\dots ,T, \end{aligned} \end{aligned}$$

which is a linear program.

4.2 Credit rating

Optimizing a portfolio of credit default swaps (CDSs) with constraints on the default probabilities of tranches [8] is a non-convex optimization problem. Let M be the number of tranches, T the number of time periods, and \(x_m^t\), \(m=1,\dots , M\), \(t=1,\dots ,T\), the attachment point of tranche m at time t. Also, suppose there are K available CDSs. Let \(\theta _k^t\) be the cumulative loss of k-th CDS at time t, and let \(y_k\) be the weight of the k-th asset in the asset pool. Then the total loss of the CDS pool at time t is \(L(\varvec{\theta }^t\!\!, {{\textbf{y}}}) = \sum _{k=1}^K \theta _k^t y_k\), where \(\varvec{\theta }^t = (\theta ^t_1,\dots , \theta ^t_K)\) and \({{\textbf{y}}}=(y_1,\dots , y_K)\). Consider constraints on the default probabilities of tranches [8]:

$$\begin{aligned} 1-{\mathbb {P}}\left[ L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^1, \dots , L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^T\right] \leqslant p_m, \quad m=2,\dots , M. \end{aligned}$$
(41)

With \(X_t = L(\varvec{\theta }^t,{{\textbf{y}}})-x_m^t\), \(t=1,\dots ,T\), (41) can be recast as

$$\begin{aligned} {\mathbb {P}}\left[ \max \{X_1, \dots , X_T\}\geqslant 0\right] \leqslant p_m, \quad m=2,\dots , M. \end{aligned}$$
(42)

Each constraint in (42) is non-convex and can be approximated by

$$\begin{aligned} {\overline{p}}_{\max \{X_1, \dots , X_T\}}(0)\leqslant {\overline{p}}_m, \end{aligned}$$
(43)

where \({\overline{p}}_m\) is some scaled bound on probability; see [8]. Also, (15b)–(15d) imply that

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}}) \leqslant {\overline{p}}_{\max \{X_1, \dots , X_T\}}(0), \end{aligned}$$

where \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is defined in (10). Consequently, \(1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\) is a better upper bound for \({\mathbb {P}}[\max (X_1, \dots ,X_T) \geqslant 0]\) than \({\overline{p}}_{\max \{X_1, \dots , X_T\}}(0)\), and (43) can be replaced by

$$\begin{aligned} 1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\leqslant {\overline{p}}_m. \end{aligned}$$
(44)

In fact, \(1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\) is the best Schur convex upper bound on this probability for an atomless probability space; see Proposition 2.5. The optimization problem considered in [8] is formulated by

$$\begin{aligned}&\qquad \min _{{{\textbf{x}}},{{\textbf{y}}}}\;\;\sum _{i=1}^T \frac{1}{(1+r)^t} \sum _{m=1}^M \Delta s_m {\mathbb {E}}[[x_{m+1}^t - L(\varvec{\theta }^t,{{\textbf{y}}})]^+] \\& \text {subject to}\;\;1-{\mathbb {P}}\left[ L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^1, \dots , L(\varvec{\theta }^t\!\!,{{\textbf{y}}})\leqslant x_m^T\right] \leqslant p_m, \quad m=2,\dots , M, \\&\qquad\qquad\sum _{k=1}^K c_k y_k \geqslant \eta , \quad \sum _{k=1}^K y_k = 1, \quad y_k \geqslant 0,\quad k=1,\dots , K, \\&\qquad\qquad x_m^t \geqslant x_{m-1}^t, \; m=1,\dots ,M,\; t=1,\dots , T,\\&\qquad\qquad 0 \leqslant x_m^t \leqslant 1, m=2,\dots , M,\; t=1,\dots ,T, \end{aligned} $$
(45)

where \({{\textbf{x}}}=\{x_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}\), r is the interest rate, \(\Delta s_m = s_m - s_{m-1}\), \(s_m\) is the spread payment for each tranche \(m=1, \dots , M\), \(c_k\) is the annual income spread payment of the k-th CDS, \(k=1, \dots , K\), and \(\eta\) is a real-valued parameter. After replacing constraints (41) by (44), the optimization problem (45) takes the form

$$\begin{aligned}&\qquad \min _{{{\textbf{x}}},{{\textbf{y}}},{{\textbf{a}}}}\;\;\sum _{i=1}^T \frac{1}{(1+r)^t} \sum _{m=1}^M \Delta s_m {\mathbb {E}}[[x_{m+1}^t - L(\varvec{\theta }^t\!\!,{{\textbf{y}}})]^+] \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad (46\text{a})\\& \text {subject to}\;\;{\mathbb {E}}[\max \{a^m_1 (L(\varvec{\theta }^1\!\!,{{\textbf{y}}}) - x_m^1),\dots , a^m_T (L(\varvec{\theta }^T\!\!,{{\textbf{y}}}) - x_m^T )\} + 1]^+ \leqslant {\overline{p}}_m, \quad m=2,\dots , M,\qquad \;\; (46\text{b})\\ &\qquad\qquad\;\;\; a^m_t \geqslant 0,\quad m=2,\dots ,M,\quad t=1,\dots , T, \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad \;\;(46\text{c}) \\&\qquad\qquad\;\; \sum _{k=1}^K c_k y_k \geqslant \eta , \sum _{k=1}^K y_k = 1, \quad y_k \geqslant 0,\quad k=1,\dots , K, \qquad\qquad\qquad\qquad\qquad\qquad \qquad \quad (46\text{d}) \\ &\qquad\qquad\;\; x_m^t \geqslant x_{m-1}^t, \; m=1,\dots ,M,\; t=1,\dots , T, \quad 0 \leqslant x_m^t \leqslant 1, m=2,\dots , M,\; t=1,\dots ,T, \;(46\text{e})\end{aligned}$$

where \({{\textbf{a}}}=\{a_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}\).

5 Case study

Veremyev et al. [17] solved the CDS portfolio problem (45) and found an optimal portfolio of CDSs along with optimal attachment/detachment points over a multi-period horizon. Pertaia et al. [8] replaced multivariate POE constraints in (45) by one-dimensional bPOE constraints (43) and solved the corresponding optimization problem.Footnote 5

Here we use Algorithm 2 to solve problem (46a)–(46e) for 53 CDSs over 5 time periods with 5 tranches for 300,000 simulated scenarios and for \(\eta =120\) and \(\eta =80\) in (46d).Footnote 6 The experiments were run on an Intel i7-2.6Ghz processor with 32GB DDR4-3200Mhz RAM and used Portfolio Safeguard (PSG),Footnote 7 which has precoded superquantile (CVaR) function as well as many other popular risk functions.

Table 1 presents objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 with \(\eta =120\) and \(\epsilon =10^{-7}\). It shows that no improvements are made after the second iteration (Algorithm 2 stops after the third iteration, but Table 1 shows one more iteration to demonstrate that there are no further improvements in the objective value). Constraint (36) in problem (37), which is solved in Steps 2 and 4, is equivalent to constraints (46b) with \(m=2\), 3, 4, 5. Table 2 reports slack in constraints (46b) for \(m=2\), 3, 4, 5 as percentage of the constraint right-hand side, i.e.,

$$\begin{aligned} \delta _m=100\%\cdot \frac{{\overline{p}}_m-{\mathbb {E}}[\max \{a^m_1 (L(\varvec{\theta }^1\!\!,{{\textbf{y}}}) - x_m^1),\dots , a^m_T (L(\varvec{\theta }^T\!\!,{{\textbf{y}}}) - x_m^T )\} + 1]^+}{{\overline{p}}_m}. \end{aligned}$$
(47)

At Step 2 in the first iteration, constraints (46b) for \(m=3\), 4, 5 are active and after that, all constraints in (46b) become inactive till the algorithm stops after the third iteration. Since problem (46a)–(46e) without constraints (46b) is convex and none of constraints (46b) is active at optimality, the found solution is globally optimal. Note that one-dimensional bPOE constraints (43) are active, as we see from Step 2 in the first iteration.

Tables 3 and 4 present results similar to those in Tables 1 and 2 but for \(\eta =80\). This time, the objective value stops improving after the second iteration even if constraints (46b) for \(m=3\), 4, 5 remain active in all iterations.

The case study shows that optimization problems with bCDF functions can be efficiently optimized (even for large-scale optimization problems with 300,000 scenarios).

Table 1 Objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 for \(\eta =120\) in constraint (46d)
Table 2 Slack, \(\delta _m\) (defined by (47)), in constraints (46b) for \(m=2\), 3, 4, 5, in Algorithm 2 for \(\eta =120\) in constraint (46d)
Table 3 Objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 for \(\eta =80\) in constraint (46d)
Table 4 Slack, \(\delta _m\) (defined by (47)), in constraints (46b) for \(m=2\), 3, 4, 5, in Algorithm 2 for \(\eta =120\) in constraint (46d)