Abstract
For a random variable, superdistribution has emerged as a valuable probability concept. Similar to cumulative distribution function (CDF), it uniquely defines the random variable and can be evaluated with a simple one-dimensional minimization formula. This work leverages the structure of that formula to introduce buffered CDF (bCDF) and reduced CDF (rCDF) for random vectors. bCDF and rCDF are shown to be the minimal Schur-convex upper bound and the maximal Schur-concave lower bound of the multivariate CDF, respectively. Special structure of bCDF and rCDF is used to construct an algorithm for solving optimization problems with bCDF and rCDF in objective or constraints. The efficiency of the algorithm is demonstrated in a case study on optimization of a collateralized debt obligation with bCDF functions in constraints.
Similar content being viewed by others
1 Introduction
The cumulative distribution function (CDF) of a random variable (r.v.) is a fundamental notion in probability theory and plays a central role in stochastic optimization, risk management, statistics, reliability theory and various applications. For instance, engineered systems, e.g., electrical grids and gas/oil pipelines, must be designed to comply with various safety and reliability regulations formulated in terms of the probability of failure calculated with CDFs. Also, smart electrical grids should be robust to rare events including power station faults and electromagnetic pulses, whereas gas pipeline operations should meet the consumer demand while being resilient to unforeseen production disruptions. Design of engineered structures and industrial systems involve multiple uncertain characteristics that are difficult to combine into a single loss function. For example, each watertight compartment of a submarine should be designed to maximize its survivability in case of an accident. Suppose that a submarine has eight compartments. Chances of failure of all compartments because of high outside pressure can be described by an eight-dimensional CDF accounting for failures of individual compartments (which are not independent because of common-cause factors: similar compartment design, outside water pressure, etc.). Another example is to ensure an uninterrupted power supply from a system of redundant emergency generators in case of a natural disaster. All generators may be impacted by the same factors and failures of individual generators are not independent. Yet another example is to evaluate the probability that several escape roads will be blocked simultaneously because of common factors such as a hurricane. In all of these examples, the probability of an event should be assessed for a random vector rather than an r.v. From optimization perspective, one of the technical difficulties that engineers face when using a multivariate CDF is that for any fixed \(\textbf{x}\in {{\mathbb {R}}}^n\), the CDF \(F_\textbf{X}(\textbf{x})\) of a random vector \(\textbf{X}\) is not a convex function of \(\textbf{X}\). Moreover, CDFs based on observations (i.e., discrete distributions) are discontinuous piecewise constant functions of \(\textbf{x}\). In this case, stochastic optimization problems result in mixed-integer optimization problems, which are notoriously hard to solve for a large number of scenarios. Also, the use of CDF in decision problems has two conceptual shortcomings. First one is that at any fixed point x, the CDF of an r.v. X does not “capture” the extent of X below x, even if a tail has a small weight, the average value of X in the tail could be quite significant. As a result, reliance just on CDF may yield solutions with undesirable tail outcomes occurring with small probabilities. The other conceptual shortcoming is that in general, a safety first principle based on the CDF of X does not agree with a common perception that aggregation reduces volatility.Footnote 1 This perception can be “captured” by the notion of convex order. Informally, an r.v. \( X \) dominates an r.v. \( Y \) in convex order, and we write \(X \geq _{cx} Y\), if the distribution of \( Y \) can be obtained from the distribution of \( X \) by aggregation. A formal definition is that \(X \geq _{cx} Y\) if \({{\mathbb {E}}}[g(X)] \geq {{\mathbb {E}}}[g(Y)]\) for any convex function \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\). \(X \geq _{cx} Y\) implies that \({{\mathbb {E}}}[X]={{\mathbb {E}}}[Y]\) and \(\sigma (X) \geq \sigma (Y)\), where \(\sigma (X)\) is standard deviation,—this supports the perception that Y, being aggregated from X and having same expected value as X, is less volatile than X. A functional f is called Schur-convex if \(X \geqslant _{cx} Y\) implies that \(f(X) \geqslant f(Y)\). In other words, a Schur-convex functional does not increase after data aggregation. Examples of such functionals include \(f(X)=\sigma (X)\) and \(f(X)=-{{\mathbb {E}}}[u(X)]\) for a concave function u. However, \(f(X)=F_X(x)={\mathbb {P}}[X\leqslant x]\) with fixed x is not Schur-convex. As a result, if \(X^*\in \mathop {\mathrm{arg\,min}}_{X\in {{{\mathcal {X}}}}} F_X(x)\) for a feasible set \({{{\mathcal {X}}}}\) and \(x \in {\mathbb {R}}\), there can exist \(Y\in {{{\mathcal {X}}}}\) such that \(X^* \geqslant _{cx} Y\) and \(F_{X^*}(x)< F_Y(x)\). In this case, \({{\mathbb {E}}}[Y]={{\mathbb {E}}}[X^*]\) and \(\sigma (Y) \leqslant \sigma (X^*)\), but Y is “less safe” than \(X^*\).
In the one dimensional case, i.e., for an r.v. X, all these CDF deficiencies were addressed with the functions of superquantile and buffered probability of exceedance (bPOE) [6]. Superquantile, which is also known as expected shortfall and conditional value-at-risk (CVaR) [14, 15] of X with confidence level \(\alpha\), is the expectation of the right \((1-\alpha )\)—tail of the distribution of X, i.e., it is the average of the largest outcomes with total probability \(1-\alpha\). Superquantile can be defined as follows
where \(q_X(s) = \inf \{x \in {\mathbb {R}}\,|\,F_X(x)>s\}\) is the quantile function and \(z^+\) denotes \(\max \{0,z\}\). bPOE is an extension of the buffered probability of failure [13] and is equal to 1 minus the inverse of superquantile, see [6],
It can also be evaluated by
see [6, 7]. Considered as a function of r.v. X for fixed x, bPOE is the minimal quasi-concave upper bound (see [6]) of probability of exceedance (POE),
Superquantile and bPOE constraints are equivalent, see [6]:
The aim of this work is to construct extensions of multivariate CDF, defined for the lower tail rather than the upper tail. The analogue of superquantile for the lower tail will be called subquantile:
To define an analog of bPOE for the lower tail, let an r.v. X with the CDF \(F_X:{\mathbb {R}}\rightarrow [0,1]\) represent a reward of some kind, e.g., money, energy, satisfied demand, etc. Then for a threshold \(\alpha \in (0,1]\), buffered CDF (bCDF) is defined as an inverse of subquantile, i.e., \({\overline{F}}_X(x)=\alpha\), where \(\alpha\) is a solution of the equation \(x={\underline{q}}_X(\alpha )\). In other words,
For every fixed x, bCDF \({\overline{F}}_X(x)\) is a quasi-convex function of r.v. X and is the minimal quasi-convex upper bound of \(F_X(x)\). This follows from the fact that bPOE is the minimal quasi-convex upper bound of POE [6]; see also [18, Eq’s (3.2.6)–(3.2.8)] and [2, Eq (30)]. bCDF admits a representation similar to (1):
Similar to the constraint equivalence (2) for the upper tail, (4) yields constraint equivalence for the lower tail:
Under certain conditions, the maximum in the last constraint can be dropped, and the constraint takes the form \(c-\alpha ^{-1}{\mathbb {E}}[c-X]^+\geqslant x\), which can be reformulated as linear constraints for discrete distributions. However, the main advantage of bCDF compared to CDF is that bCDF describes the distribution tails. For instance, when r.v. \(X\) represents a portfolio value and depends on the portfolio weights (decision variables), minimization of \({\overline{F}}_X(x)\) yields an optimal solution \(X^*\) with \({\overline{F}}_{X^*}(x)=\alpha ^*\) such that the average value of \(X^*\) in \(\alpha ^*\cdot 100\%\) of the worst outcomes is at least x. Under some mild conditions, optimization of one-dimensional bCDF can be reduced to convex and linear programming (for discrete distributions). For example, let \({{\textbf{X}}}= (X_1,\dots ,X_m)\) be a known random vector, \(x\in {\mathbb {R}}\) a given threshold, \({{\textbf{w}}}=( w_1,\dots ,w_m) \in {\mathbb {R}}^m\) an unknown vector, and \(X={{\textbf{w}}}^{\! \top } {{\textbf{X}}}= w_1 X_1 +\cdots + w_m X_m\). Then, the problem of minimizing \({\overline{F}}_X(x)\) with respect to \({{\textbf{w}}}\) can be formulated with (5) as
where \({\textbf{c}} =a\,{{\textbf{w}}}\) is a new decision vector. The last optimization problem can be reduced to convex and linear programming for a discrete distribution as in [6]. This also follows from a simple observation that
Rockafellar and Royset [12] introduced super CDF (sCDF) as the inverse of superquantile, i.e., \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})={\overline{q}}^{\,-1}_X (x) = 1- {\overline{p}}_X(x)\), which can be shown to be the maximal quasi-concave lower bound of CDF. It follows from (1) that
We will call super CDF by reduced CDF (rCDF) to emphasize that this function is a lower bound of CDF, versus buffered CDF which is an upper bound of CDF.
It follows from (5) and (7) that bCDF and rCDF are related by
Generalizing bCDF and rCDF for random vectors is not straightforward. The difficulty is that in the multivariate case, the notions of quantile and its inverse are not well-defined. A potential insight can be offered through exploring the properties of (5) and (7). This work generalizes bCDF and rCDF given by (5) and (7) to random vectors. In contrast to the one-dimensional case, the multivariate bCDF and rCDF are not quasi-convex and quasi-concave functions of a random vector. We show that they are the minimal Schur-convex upper bound and the maximal Schur-concave lower bound of the multivariate CDF, respectively, and that their special structure allows us constructing an algorithm, which can quickly find their local extrema. We demonstrate efficiency of the algorithm in a case study on optimization of a collateralized debt obligation with bCDF functions in constraints.
The paper is organized into five sections and five appendices. Section 2 introduces bCDF and rCDF and shows that they are upper and lower bounds for a multivariate CDF. Section 3 considers bCDF and rCDF in optimization problems. Section 4 presents applications of bCDF and rCDF. Section 5 discusses the case study. Appendices A, B and C present proofs of three propositions and Appendices D and E summarize algorithms for solving optimization problems with bCDF and rCDF in objective function and constraints.
2 The lower and upper bounds for multivariate CDF
Let \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) be a probability space, where \(\Omega\) is an arbitrary non-empty set, \({{{\mathcal {F}}}}\) is the \(\sigma\)-algebra of subsets of \(\Omega\), and \({\mathbb {P}}\) is a probability measure on \((\Omega ,{{{\mathcal {F}}}})\). Sets in \({{{\mathcal {F}}}}\) are called events. A probability space \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) is called atomless if there exists an r.v. on it with a continuous CDF.
An n-dimensional random vector \({{\textbf{X}}}=(X_1, \dots , X_n)\) is a function \({{\textbf{X}}}:\Omega \rightarrow {\mathbb {R}}^n\) such that for every \({{\textbf{x}}}\in {\mathbb {R}}^n\) set \(\{\omega \in \Omega \,|\,{{\textbf{X}}}(\omega ) \leqslant {{\textbf{x}}}\}\) is an event,Footnote 2 and \(X_1, \dots , X_n\) are called r.v.’s. On an atomless probability space, there exists a collection of random vectors with any given joint distribution. Let \(L^{1,n}(\Omega )\) and \(L^{\infty ,n}(\Omega )\) denote the sets of n-dimensional random vectors \({{\textbf{X}}}\) for which \({{\mathbb {E}}}[{{\textbf{X}}}]=\int _\Omega {{\textbf{X}}}(\omega )d{\mathbb {P}}\) and \(\max\limits_i \sup |{{X_i}}|\) exist and are finite, respectively.
Definition 1
A random vector \({{\textbf{X}}}\in L^{1,n}(\Omega )\) dominates random vector \({{\textbf{Y}}}\in L^{1,n}(\Omega )\) in convex order, and we write \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\), if \({{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[g({{\textbf{Y}}})]\) for any convex function \(g:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\).
The convexity of g implies existence of \({{\textbf{a}}}\in {\mathbb {R}}^n\) and \(b \in {\mathbb {R}}\) such that \(g(x) \geqslant {{\textbf{a}}}^\top x + b\) for all \({{\textbf{x}}}\in {\mathbb {R}}^n\), so that \({{\mathbb {E}}}[g({{\textbf{X}}})] \geqslant {{\mathbb {E}}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ b] = {{\textbf{a}}}^\top {{\mathbb {E}}}[{{\textbf{X}}}] + b > -\infty\) is well-defined for every \({{\textbf{X}}}\in L^{1,n}(\Omega )\), although it can be \(+\infty\).
Definition 2
A function \(f:L^{1,n}(\Omega ) \rightarrow {{\mathbb {R}}}\) is Schur-convex if \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\) implies \(f({{\textbf{X}}}) \geqslant f({{\textbf{Y}}})\). A function f is Schur-concave if \(-f\) is Schur-convex.
Definition 2 implies that any function in the form \(f({{\textbf{X}}})={{\mathbb {E}}}[g({{\textbf{X}}})]\) is Schur-convex and Schur-concave if g is convex and concave, respectively. Any function f has the unique minimal Schur-convex upper bound given by
see [6], as well as the the unique maximal Schur-concave lower bound
Let
denote multivariate CDF of \({{\textbf{X}}}\).
Definition 3
For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\), buffered CDF (bCDF) and reduced CDF (rCDF) are defined by
respectively, where \({{\textbf{a}}}=(a_1, \dots , a_n)\), \({{\textbf{a}}}^{\! \top } {{\textbf{X}}}=\sum _{i=1}^n a_i X_i\), and \({\mathbb {R}}^n_+=\{{{\textbf{a}}}\in {\mathbb {R}}^n, \, {{\textbf{a}}}\geqslant 0\}\).
Remark
With an auxiliary variable \(a_0 \geqslant 0\) and the following set (simplex)
bCDF and rCDF, defined by (9) and (10), respectively, can be recast as
provided that infimums in (12) and (13) with respect to \(a_0\geqslant 0\) are attained. Representations (12) and (13) show that bCDF and rCDF of a random vector \({{\textbf{X}}}\) are, in fact, bPOE of corresponding r.v.’s and minimized with respect to the vector \({{\textbf{a}}}\) over the simplex \({{\textbf{A}}}\). These representations are important in devising efficient algorithms for solving optimization problems with bCDF and rCDF. If \({{\textbf{X}}}\) depends on decision variables, optimal decision problems involving bCDF and rCDF in either objective function or constraints are bilevel optimization, see Sect. 3.
First we show that bCDF and rCDF are the upper and lower bounds for CDF, respectively.
Proposition 2.1
For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\) and any \({{\textbf{x}}}\in {\mathbb {R}}^n\),
Proof
Let \({{\textbf{a}}}\in {\mathbb {R}}^n_+\) be arbitrary, and let
be the event that \({{\textbf{X}}}(\omega )< {{\textbf{x}}}\). Then for any \(\omega \not \in A\) we have \(X_i(\omega )\geqslant x_i\) for some i, so that
where \(I_A\) is the indicator indicator functionFootnote 3 of the event A. Also, for \(\omega \in A\),
and consequently,
Since \(a \in {\mathbb {R}}^n_+\) is arbitrary, the last inequality and (10) yield the first inequality in (14). Similarly, if
is the event that \({{\textbf{X}}}(\omega )\leqslant {{\textbf{x}}}\), then for any \(\omega \in B\),
and for any \(\omega \not \in B\),
so that
which along with (9) yields the last inequality in (14). \(\square\)
It will be shown later that bCDF and rCDF are the minimal Schur-convex upper bounds of \(F_{{{\textbf{X}}}}({{\textbf{x}}})\) and the maximal Schur-concave lower bounds of \({\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]\), respectively.
For an r.v. X,
where \({\overline{p}}_X(x)\) is the (one-dimensional) bPOE defined in (1).
Also, note that
Note that \(F_{{\textbf{X}}}(\textbf{x})\) can be represented by
where “\(W \in \{0,1\}\)” indicates that the supremum is over all random variables such that \(W(\omega )\in \{0,1\}\) for every \(\omega \in \Omega\), while the constraint \(X_i W \leqslant x_i W\) means that \(X_i(\omega ) W(\omega ) \leqslant x_i W(\omega )\) for all \(\omega \in \Omega\). The supremum in (16) is attained when W is the indicator function of the event \({{\textbf{X}}}\leqslant \textbf{x}\). Similarly, bCDF has the following interpretation through dual characterization.
Proposition 2.2
For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\) and any \({{\textbf{x}}}\in {\mathbb {R}}^n\),
where \(W\in [0,1]\) means that \(0 \leqslant W(\omega ) \leqslant 1\) for all \(\omega \in \Omega\).
Proof
See Appendix A. \(\square\)
Proposition 2.3
If the underlying probability space \((\Omega , {{{\mathcal {F}}}}, {\mathbb {P}})\) is atomless, then \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is the minimal Schur-convex upper bound for \(F_{{{\textbf{X}}}}({{\textbf{x}}})\).
Proof
See Appendix B. \(\square\)
When \(n=1\), \({\overline{F}}_X(x)\) is also a unique minimal quasi-convex upper bound for \(F_X(x)\), see [6]. This is not true when \(n\geqslant 2\).
Proposition 2.4
In general (when \(n\geqslant 2\)), \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is not quasi-convex in \({{\textbf{X}}}\).
Proof
Let \(n\geqslant 2\), \({{\textbf{0}}}=(0,0,\dots ,0)\), \({{\textbf{x}}}_0=(1,-1,0,\dots ,0)\in {\mathbb {R}}^n\) and \({{\textbf{X}}}\) be such that \({\mathbb {P}}[{{\textbf{X}}}={{\textbf{x}}}_0]=1\). Then \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})=\min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} {\mathbb {E}}[{{\textbf{a}}}^\top ({{\textbf{0}}}-{{\textbf{X}}}) + 1]^+ = \min _{{{\textbf{a}}}\in {\mathbb {R}}^n_+} [-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+ = 0\), where the last equality follows from \([-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+\geqslant 0\) and that \([-{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0\) for \({{\textbf{a}}}=(1,0,\dots ,0)\). Similarly, for \({{\textbf{a}}}=(0,1,0\dots ,0)\), it is valid \([{{\textbf{a}}}^\top {{\textbf{x}}}_0 + 1]^+=0\), which yields \({\overline{F}}_{-{{\textbf{X}}}}({{\textbf{0}}})=0\). However, \({\overline{F}}_{({{\textbf{X}}}+(-{{\textbf{X}}}))/2}(0)={\overline{F}}_{{{\textbf{0}}}}({{\textbf{0}}})=1>0\). Consequently, \({\overline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is not quasi-convex. \(\square\)
Remark
The proof of Proposition 2.4 uses constant random vectors. Suppose that \(n=2\) and that \({{\textbf{X}}}\) is a constant r.v., i.e., \({\mathbb {P}}[{{\textbf{X}}}=(x_1,x_2)]=1\). Let
If \(x_1 \leqslant 0\) and \(x_2 \leqslant 0\), then \(a_1(-x_1)+a_2(-x_2) + 1 \geqslant 1\), so that \(g(x_1,x_2)=1\). If \(x_1>0\) then for \(a_1=1/x_1\) and \(a_2=0\), we have \(a_1(-x_1)+a_2(-x_2) + 1 = 0\), and consequently, \(g(x_1,x_2)=0\). Similarly, if \(x_2>0\) then \(g(x_1,x_2)=0\). Thus, \(g(x_1,x_2)=1\) when \(x_1\leqslant 0\) and \(x_2 \leqslant 0\), and \(g(x_1,x_2)=0\) otherwise. This function is not quasi-convex and cannot be made quasi-convex by altering it values at a finite number of points.
Now let \(I=\{\left. (a'_1, a'_2)\right| a'_1 \geqslant 0, \, a'_2 \geqslant 0, \, a'_1 + a'_2 = 1\}\) be the interval with endpoints (1, 0) and (0, 1). Then \(g(x_1,x_2)\) can be represented as
where
Observe that \(h_{a'_1,a'_2}(x_1,x_2)=1\) when \(a'_1x_1 + a'_2x_2 \leqslant 0\) and \(h_{a'_1,a'_2}(x_1,x_2)=0\) otherwise, and that \(h_{a'_1,a'_2}(x_1,x_2)\) is quasi-convex. However, the infimum of such functions over \((a'_1,a'_2) \in I\) is not quasi-convex. As \((a'_1,a'_2)\) moves along I from (1, 0) to (0, 1), the line \(a'_1x_1 + a'_2x_2\) rotates from the x-axis to the y-axis. As a result, \(g(x_1,x_2) = 1\) if and only if the point \((x_1,x_2)\) remains “under” the rotating line, which occurs if and only if \(x_1 \leqslant 0\) and \(x_2 \leqslant 0\).
Note that \(g(x_1,x_2)\) is the infimum of quasi-convex functions \(h_{a'_1,a'_2}(x_1,x_2)\), but \(g(x_1,x_2)\) is not “closely approximated” by any of these functions. Also, every function \(h_{a'_1,a'_2}(x_1,x_2)\) is a quasi-convex upper bound for \(g(x_1,x_2)\). However, none of these quasi-convex upper bounds is “best” or “minimal”. So, \(g(x_1,x_2)\) has many quasi-convex upper bounds, but the unique “minimal” quasi-convex upper bound does not exist.
In fact, \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is also not quasi-concave in \({{\textbf{X}}}\), even for \(n=2\), see [9]. It can be interpreted through dual characterization in a way similar to that of bCDF. Indeed,
Proposition 2.5
Define
Also, let
be the the maximal Schur-concave lower bound for \({\mathbb {P}}[{{\textbf{X}}}< {{\textbf{x}}}]\). Then on an atomless probability space,
Proof
See Appendix C. \(\square\)
For illustration of these results, let r.v.’s \(X_1\) and \(X_2\) model profits from two different investments and let an investor evaluate the probability of both \(X_1\) and \(X_2\) to be non-positive. Let \({{\textbf{X}}}=(X_1,X_2)\), \({{\textbf{0}}}=(0,0)\), and let \(F_{{{\textbf{X}}}}({{\textbf{0}}})={\mathbb {P}}[{{\textbf{X}}}\leqslant {{\textbf{0}}}]\). Let \({{\textbf{X}}}\) assume values \((-1,-1)\), (1, 1) with probabilities 1/3 and 2/3, respectively. Then \(F_{{{\textbf{X}}}}({{\textbf{0}}})=1/3\). Suppose X takes values \((-1,-1)\), (1, 1), and (1, 1) in the events \(\omega _1\), \(\omega _2\) and \(\omega _3\) with probabilities \({\mathbb {P}}[\omega _1]={\mathbb {P}}[\omega _2]={\mathbb {P}}[\omega _3]=1/3\). Let \({{{\mathcal {F}}}}=\{A,B\}\) be a partition of the probability space by the events \(A=\{\omega _1,\omega _2\}\) and \(B=\{\omega _3\}\). Then \({{\textbf{Y}}}={\mathbb {E}}[{{\textbf{X}}}|{{{\mathcal {F}}}}]\) is a random vector equal to (0, 0) and (1, 1) with probabilities 2/3 and 1/3, respectively. Note that Y is “averaged” version of X. However, we have \(F_{{{\textbf{Y}}}}({{\textbf{0}}})=2/3 > 1/3 = F_{{{\textbf{Y}}}}({{\textbf{0}}})\). This means that \(F_{{{\textbf{X}}}}({{\textbf{0}}})\) is not Schur-convex as a function of \({{\textbf{X}}}\). Instead, we suggest to use its Schur-convex upper bound given in Proposition 2.5—such bound decreases when \({{\textbf{X}}}\) is replaced by its average-out version.
This example also demonstrates convenience of modeling the probability space as atomless even for discrete random variables. If the original random vector \({{\textbf{X}}}\) were defined on a discrete probability space \(\Omega =\{\omega _1,\omega _2\}\) with \({\mathbb {P}}[\omega _1]=1/3\) and \({\mathbb {P}}[\omega _2]=2/3\), the “averaged-out” version \({{\textbf{Y}}}\) of \({{\textbf{X}}}\) would not exist. For this \(\Omega\),
which illustrates that Proposition 2.3 fails on discrete probability spaces.
3 Buffered and reduced CDFs in optimization problems
Let \({{\textbf{X}}}\) be a random vector, \({{\textbf{x}}}\in {\mathbb {R}}^n\), and let \(\textbf{f}({{\textbf{w}}};{{\textbf{X}}})= (f_1({{\textbf{w}}};{{\textbf{X}}}),\ldots , f_n({{\textbf{w}}};{{\textbf{X}}}))\) be a vector of concave functions of \({{\textbf{w}}}\) and \({\textbf{h}}({{\textbf{w}}};{{\textbf{X}}})= (h_1({{\textbf{w}}};{{\textbf{X}}}),\ldots ,h_n({{\textbf{w}}};{{\textbf{X}}}))\) be a vector of convex functions of \({{\textbf{w}}}\), defined on a convex set \({{\textbf{W}}}\subseteq {\mathbb {R}}^k\). Let also \({\mathbb {E}}[|f_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty\), \({\mathbb {E}}[|h_i ({{\textbf{w}}}; {{\textbf{X}}})|] < \infty\) for all \({{\textbf{w}}}\in {{\textbf{W}}}\) and \(i\in \{1,\dots ,n\}\).
If all \(f_i\) in the definition of \({\textbf{f}}({{\textbf{w}}};{{\textbf{X}}})\) are convex, and \({{{\mathcal {W}}}}= {{{\mathcal {W}}}}_1 \times \cdots \times {{{\mathcal {W}}}}_n\) with all \({{{\mathcal {W}}}}_1,\dots ,{{{\mathcal {W}}}}_n\) being convex, then the problem \(\min _\mathbf{w \in {{{\mathcal {W}}}}} {\overline{F}}_\mathbf{f(w;X)}(\textbf{x})\) is convex.
3.1 bCDF and rCDFs in objective
This section considers the following two optimization problems
and
With functions \(G_{\! F}({{\textbf{a}}}, {{\textbf{w}}})\) and \(G_{\! H}({{\textbf{a}}},{{\textbf{w}}})\) defined by
and
bCDF and rCDF can be recast as
and
It is assumed that in (20) and (21), the infimum and supremum with respect to \({{\textbf{a}}}\) can be replaced by the minimum and maximum, respectively.Footnote 4 With this assumption, optimization problems (18) and (19) along with (20) and (21) take the form
and
respectively, and (22) and (23) can be written as
Further, with an auxiliary variable \(a_0 \geqslant 0\) and with the set \({{\textbf{A}}}\) defined by (11), problem (24) can be equivalently reformulated in the form
The idea is to solve (25) iteratively by alternating variables \((a_0, {{\textbf{w}}})\) and \({{\textbf{a}}}\): solve (25) with respect to \((a_0, {{\textbf{w}}})\), then with respect to \({{\textbf{a}}}\), and again with respect to \((a_0, {{\textbf{w}}})\), and then again with respect to \({{\textbf{a}}}\), and so on. This method is known as coordinate descent, or, more generally, Block Coordinate Descent (BCD), in which variables are divided into m blocks, and optimization is done iteratively for one block at a time. BCD may not converge to stationary points of a non-convex objective function, which is convex in each block coordinate [10]. However, global convergence was studied under additional assumptions: two-block (\(m=2\)) or strict quasiconvexity for \(m-2\) blocks [3, 4] and uniqueness of minimizer per block [1, Sect. 2.7]. See also [5] for convergence of BCD for nonconvex problems and for relevant references.
Note that in our case the optimization problems
are convex with respect to \({{\textbf{a}}}\), since the objective is the maximum of linear functions with respect to components of the vector \({{\textbf{a}}}\). On the other hand, the optimization problems
can be reduced to convex programming in \({{\textbf{w}}}\), since they are bPOE minimization problems. Indeed,
Since all components of the vector function \({\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}})\) are concave in \({{\textbf{w}}}\), the function \({{\textbf{a}}}^\top ({{\textbf{x}}}-{\textbf{f}}({{\textbf{w}}}; {{\textbf{X}}}))\) is convex in \({{\textbf{w}}}\). Consequently, minimization of bPOE with respect to \({{\textbf{w}}}\) for this function can be reduced to convex programming, see [6]. Also, (15a) and (15b) imply that
Since all components of the vector function \({\textbf{h}}({{\textbf{w}}}; {{\textbf{X}}})\) are convex in \({{\textbf{w}}}\), and components of the vector \({{\textbf{a}}}\) are nonnegative, the function
is convex in \({{\textbf{w}}}\). Consequently, minimization of bPOE for this function can be reduced to convex programming, see [6]. Finally, with bPOE functions, problem (25) can be formulated as follows
where
The algorithm for solving problem (28) is summarized in Appendix D.
3.2 bCDF and rCDF in constraints
This section considers minimization of an objective function \(V({{\textbf{w}}})\) with respect to variables \({{\textbf{w}}}\in {\mathbb {R}}^k\):
subject to constraints on buffered and reduced CDFs:
With the functions \(G_{\! F}(a_F \,{{\textbf{a}}}_F,{{\textbf{w}}})\) and \(G_{\! H}(a_H \,{{\textbf{a}}}_H,{{\textbf{w}}})\) introduced in Sect. 3.1 and under the assumption that in (20) and (21), the infimum and supremum with respect to \({{\textbf{a}}}\) can be replaced by the minimum and maximum, respectively, problem (29)–(31) can be reformulated as follows
subject to
Relationship (26) implies that the constraint (33) with respect to variables \((a_F, {{\textbf{a}}}_F, {{\textbf{w}}})\) can be replaced by
with respect to variables \(({{\textbf{a}}}_F, {{\textbf{w}}})\). By (2), the last constraint is equivalent to the superquantile constraint:
Similarly, (27) implies that the constraint (34) with respect to variables \((a_H, {{\textbf{a}}}_H, {{\textbf{w}}})\) can be replaced by
with respect to variables \(({{\textbf{a}}}_H, {{\textbf{w}}})\). By (2), the last constraint is equivalent to the superquantile constraint
Finally, problem (32)–(34) can be recast in the form
Problem (37) can be solved iteratively as follows: solve (37) with respect to variables \({{\textbf{w}}}\) with fixed \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), then minimize the subquantile functions in (35) and (36) with respect to \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), respectively, i.e., solve
then solve (37) again with respect to \({{\textbf{w}}}\) with previously found \({{\textbf{a}}}_F\) and \({{\textbf{a}}}_H\), then solve (38) and (39), and so on.
The described algorithm for solving problem (37) is summarized in Appendix E.
4 Applications
4.1 Two-asset option
A two-asset option is an exotic option whose payoff depends on prices \(p_A(t)\) and \(p_B(t)\) of assets A and B at time t. For example, it may have nonzero payoff if \(p_A(T)\) and \(p_B(T)\) simultaneously exceed some strike prices \(C_A\) and \(C_B\), respectively, at some future time T. In this case, if \(p_A(T)\) and \(p_B(T)\) are assumed to be random variables, the option pays with the probability
Since in general, \(p_A(T)\) and \(p_B(T)\) are not independent random variables, \({\mathbb {P}}[p_A (T)\geqslant C_A, p_B(T) \geqslant C_B]\not ={\mathbb {P}}[p_A(T) \geqslant C_A] \cdot {\mathbb {P}}[p_B(T) \geqslant C_B]\). If \(p_A(0)\) and \(p_B(0)\) are (known) prices of assets A and B at the current time \(t=0\), then (40) can be recast in terms of adjusted rates of returns \(r_A=(p_A(T)-C_A)/p_A(0)\) and \(r_B=(p_B(T)-C_B)/p_B(0)\):
The joint distribution of \(r_A\) and \(r_B\) can be estimated from the historical data. Let \(r_{A,i}\) and \(r_{B,i}\), \(i=1,\dots , T\), be the adjusted rates of return of assets A and B over the last T periods. Then p can be estimated as the number of times \((r_{A,i},r_{B,i})\geqslant 0\) divided by T. However, this estimate is sensitive to small perturbations in the data. Alternatively, we can define \(X=(-r_A,-r_B)\) and estimate the smallest Schur-convex upper bound (9) for p:
which is a linear program.
4.2 Credit rating
Optimizing a portfolio of credit default swaps (CDSs) with constraints on the default probabilities of tranches [8] is a non-convex optimization problem. Let M be the number of tranches, T the number of time periods, and \(x_m^t\), \(m=1,\dots , M\), \(t=1,\dots ,T\), the attachment point of tranche m at time t. Also, suppose there are K available CDSs. Let \(\theta _k^t\) be the cumulative loss of k-th CDS at time t, and let \(y_k\) be the weight of the k-th asset in the asset pool. Then the total loss of the CDS pool at time t is \(L(\varvec{\theta }^t\!\!, {{\textbf{y}}}) = \sum _{k=1}^K \theta _k^t y_k\), where \(\varvec{\theta }^t = (\theta ^t_1,\dots , \theta ^t_K)\) and \({{\textbf{y}}}=(y_1,\dots , y_K)\). Consider constraints on the default probabilities of tranches [8]:
With \(X_t = L(\varvec{\theta }^t,{{\textbf{y}}})-x_m^t\), \(t=1,\dots ,T\), (41) can be recast as
Each constraint in (42) is non-convex and can be approximated by
where \({\overline{p}}_m\) is some scaled bound on probability; see [8]. Also, (15b)–(15d) imply that
where \({\underline{F}}_{{{\textbf{X}}}}({{\textbf{x}}})\) is defined in (10). Consequently, \(1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\) is a better upper bound for \({\mathbb {P}}[\max (X_1, \dots ,X_T) \geqslant 0]\) than \({\overline{p}}_{\max \{X_1, \dots , X_T\}}(0)\), and (43) can be replaced by
In fact, \(1-{\underline{F}}_{{{\textbf{X}}}}({{\textbf{0}}})\) is the best Schur convex upper bound on this probability for an atomless probability space; see Proposition 2.5. The optimization problem considered in [8] is formulated by
where \({{\textbf{x}}}=\{x_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}\), r is the interest rate, \(\Delta s_m = s_m - s_{m-1}\), \(s_m\) is the spread payment for each tranche \(m=1, \dots , M\), \(c_k\) is the annual income spread payment of the k-th CDS, \(k=1, \dots , K\), and \(\eta\) is a real-valued parameter. After replacing constraints (41) by (44), the optimization problem (45) takes the form
where \({{\textbf{a}}}=\{a_m^t\}_{m=2,\dots ,M}^{t=1,\dots ,T}\).
5 Case study
Veremyev et al. [17] solved the CDS portfolio problem (45) and found an optimal portfolio of CDSs along with optimal attachment/detachment points over a multi-period horizon. Pertaia et al. [8] replaced multivariate POE constraints in (45) by one-dimensional bPOE constraints (43) and solved the corresponding optimization problem.Footnote 5
Here we use Algorithm 2 to solve problem (46a)–(46e) for 53 CDSs over 5 time periods with 5 tranches for 300,000 simulated scenarios and for \(\eta =120\) and \(\eta =80\) in (46d).Footnote 6 The experiments were run on an Intel i7-2.6Ghz processor with 32GB DDR4-3200Mhz RAM and used Portfolio Safeguard (PSG),Footnote 7 which has precoded superquantile (CVaR) function as well as many other popular risk functions.
Table 1 presents objective value in Step 2 or 4, running time in Step 2 or 4, and subquantile sub-problem running time in Step 3 or 5 for each iteration in Algorithm 2 with \(\eta =120\) and \(\epsilon =10^{-7}\). It shows that no improvements are made after the second iteration (Algorithm 2 stops after the third iteration, but Table 1 shows one more iteration to demonstrate that there are no further improvements in the objective value). Constraint (36) in problem (37), which is solved in Steps 2 and 4, is equivalent to constraints (46b) with \(m=2\), 3, 4, 5. Table 2 reports slack in constraints (46b) for \(m=2\), 3, 4, 5 as percentage of the constraint right-hand side, i.e.,
At Step 2 in the first iteration, constraints (46b) for \(m=3\), 4, 5 are active and after that, all constraints in (46b) become inactive till the algorithm stops after the third iteration. Since problem (46a)–(46e) without constraints (46b) is convex and none of constraints (46b) is active at optimality, the found solution is globally optimal. Note that one-dimensional bPOE constraints (43) are active, as we see from Step 2 in the first iteration.
Tables 3 and 4 present results similar to those in Tables 1 and 2 but for \(\eta =80\). This time, the objective value stops improving after the second iteration even if constraints (46b) for \(m=3\), 4, 5 remain active in all iterations.
The case study shows that optimization problems with bCDF functions can be efficiently optimized (even for large-scale optimization problems with 300,000 scenarios).
Notes
Here, “safety first principle based on the CDF” means that for a given threshold \(x \in {{\mathbb {R}}}\), an r.v. X is preferred to an r.v. Y, if and only if \(F_X(x)={\mathbb {P}}[X\leqslant x]\) does not exceed \(F_Y(x)={\mathbb {P}}[Y\leqslant x]\). For example, let \({\mathbb {P}}[X=-1]=1/2\) and \({\mathbb {P}}[X=1]=1/2\). Then for \(x=0\), \({\mathbb {P}}[X\leqslant 0]={\mathbb {P}}[-X\leqslant 0]=1/2\), whereas \({\mathbb {P}}[(X+(-X))/2\leqslant 0]={\mathbb {P}}[0\leqslant 0]=1>1/2\). Consequently, both X and \(-X\) are (strictly) preferred to \((X+(-X))/2\), contrary to the perception that aggregation reduces risk.
Inequalities between vectors hold component-wise.
As usual, the indicator function of the event A is the random variable \(I_A\) such that \(I_A(w)=1\) for \(w \in A\) and \(I_A(w)=0\) for \(w \not \in A\).
See, e.g., [11, Theorem 27.1] for sufficient conditions for a convex function to attain its minimum.
Solution of problem (45) and of the one with bPOE constraints are available at http://uryasev.ams.stonybrook.edu/index.php/research/testproblems/financial_engineering/structuring-step-up-cdo/
The codes and solutions of this problem are available at http://uryasev.ams.stonybrook.edu/index.php/research/testproblems/financial_engineering/structuring-step-up-cdo/.
Portfolio Safeguard software is a product of American Optimal Decisions, Inc. (http://aorda.com). To optimize convex nonsmooth functions, PSG solves a sequence of linear or quadratic programming problems. For large-scale optimization problems, PSG can call GUROBI solver (Gurobi Optimization, https://www.gurobi.com/).
References
Bertsekas, D.: Nonlinear programming. J. Oper. Res. Soc. 48(3), 334 (1997)
Grechuk, B., Molyboha, A., Zabarankin, M.: Chebyshev’s inequalities with law invariant deviation measures. Probab. Eng. Inf. Sci. 24, 145–170 (2010)
Grippof, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Grippof, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26(3), 127–136 (2000)
Lyu, H.: Convergence and complexity of block coordinate descent with diminishing radius for nonconvex optimization, arXiv:2012.03503 (2020)
Mafusalov, A., Uryasev, S.: Buffered probability of exceedance: mathematical properties and optimization. SIAM J. Optim. 28(2), 1077–1103 (2018)
Norton, M., Uryasev, S.: Maximization of AUC and buffered AUC in binary classification. Math. Program. 174(1–2), 575–612 (2019)
Pertaia, G., Prokhorov, A., Uryasev, S.: A new approach to credit ratings. J. Bank. Finance 140, 106097 (2022)
Pinelis, I.: On the convexity of certain set of random vectors. Mathoverflow, https://mathoverflow.net/questions/415870/
Powell, M.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)
Rockafellar, R.T.: Convex Analysis, p. 451. Princeton University Press, Princeton, NJ (1970)
Rockafellar, R.T., Royset, J.O.: Random variables, monotone relations, and convex analysis. Math. Program. 148, 297–331 (2014)
Rockafellar, R.T., Royset, J.O.: On buffered failure probability in design and optimization of structures. Reliab. Eng. Syst. Saf. 95, 499–510 (2011)
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
Rockafellar, R.T., Uryasev, S.: Conditional value-at-Risk for general loss distributions. J. Bank. Finance 26, 1443–1471 (2002)
Rockafellar, R.T., Uryasev, S., Zabarankin, M.: Generalized deviations in risk analysis. Finance Stoch. 10(1), 51–74 (2006)
Veremyev, A., Tsyurmasto, P., Uryasev, S.: Optimal structuring of CDO contracts: optimization approach. J. Credit Risk 8(4), 133–155 (2012)
Zabarankin, M., Uryasev, S.: Statistical Decision Problems: Selected Concepts and Portfolio Safeguard Case Studies. Springer, Berlin (2014)
Acknowledgements
We are grateful to Mr. Edward Cummings from the Stony Brook University for the numerical implementation of the case study. We are also grateful to the referees for their comments and suggestions, which helped to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A. Proof of proposition 2.2
With notation
Proposition 2.2 can be reformulated as follows.
Proposition A.1
For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\),
Proof
Let \({\overline{p}}'({{\textbf{X}}})\) be the right-hand side of (49). We first prove that \({\overline{p}}'({{\textbf{X}}}) \leqslant {\overline{p}}({{\textbf{X}}})\). Let \({{{\mathcal {W}}}}\) be the feasible set in (49). For every \(W \in {{{\mathcal {W}}}}\), and every \({{\textbf{a}}}\in {\mathbb {R}}^n_+\),
Since the second and the third terms in the right-hand side are non-negative,
and since \(W \in {{{\mathcal {W}}}}\) and \({{\textbf{a}}}\in {\mathbb {R}}^n_+\) are arbitrary, \({\overline{p}}'({{\textbf{X}}}) \leqslant {\overline{p}}({{\textbf{X}}})\) follows.
We next prove \({\overline{p}}'({{\textbf{X}}}) \geqslant {\overline{p}}({{\textbf{X}}})\). For \(n=1\), X is a random variable and \({\overline{p}}'(X) \geqslant {\overline{p}}(X)\) was claimed in the proof of Proposition 2.5 in [6] (however, the proof was incomplete). In this case, let \({\underline{q}}_X(\alpha )\) be defined by (3) and let \({\underline{q}}^{-1}_X(x)\) be the inverse function for \({\underline{q}}_X(\alpha )\), which is properly defined on \(x\in [{\mathbb {E}}X, \sup X)\), see [6].
Then consider the following cases:
-
If \(\sup X <0\), then \({{{\mathcal {W}}}}=\{0\}\) and \({\overline{p}}'(X)={\mathbb {E}}[0]=0\).
-
If \(\sup X = 0\), then \(I_{\{X=0\}} \in {{{\mathcal {W}}}}\), hence \({\overline{p}}'(X)\geqslant {\mathbb {E}}[I_{\{X=0\}}]={\mathbb {P}}[X=0]={\mathbb {P}}[X=\sup X]\).
-
If \({{\mathbb {E}}}[X]< 0 < \sup X\), then let \(t=q_X({\underline{q}}^{-1}_X(0))\) and let \(A^-,A^0,A^+\) be the events \(X<t\), \(X=t\), and \(X>t\), respectively. The definition of t implies that
$$\begin{aligned} 1-{\mathbb {P}}[A^+]-{\mathbb {P}}[A^0]\leqslant {\underline{q}}^{-1}_X(0) \leqslant 1-{\mathbb {P}}[A^+]. \end{aligned}$$(50)By definition of \({\underline{q}}^{-1}_X(0)\),
$$\begin{aligned} \begin{aligned} 0&= \int _{{\underline{q}}^{-1}_X(0)}^1 H_{X}(\alpha )d\alpha = \int _{{{\underline{q}}^{-1}_X(0)}}^{1-{\mathbb {P}}[A^+]} t d\alpha + \int _{1-{\mathbb {P}}[A^+]}^1 H_X(\alpha )d\alpha \\&= t\left( 1-{\mathbb {P}}[A^+]-{{\underline{q}}^{-1}_X(0)}\right) + \int _{A^+} X(\omega )d{\mathbb {P}}= r t {\mathbb {P}}[A_0] + \int _{A^+} X(\omega )d{\mathbb {P}}, \end{aligned} \end{aligned}$$where \(r=\frac{1-{\mathbb {P}}[A^+]-{{\underline{q}}^{-1}_X(0)}}{{\mathbb {P}}[A_0]}\) if \({\mathbb {P}}[A_0]>0\) and \(r=0\) if \({\mathbb {P}}[A_0]=0\). Note that \(0\leqslant r \leqslant 1\) by (50).
Now, let W be an r.v. assuming values of 0, r, and 1 on \(A^-\), \(A^0\), and \(A^+\), respectively. Then
$$\begin{aligned} {{\mathbb {E}}}[XW] = 0 + r \int _{A^0} X(\omega )d{\mathbb {P}}+ \int _{A^+} X(\omega )d{\mathbb {P}}= r t {\mathbb {P}}[A_0] + \int _{A^+} X(\omega )d{\mathbb {P}}= 0, \end{aligned}$$hence \(W \in {{{\mathcal {W}}}}\). Thus,
$$\begin{aligned} {\overline{p}}'(X)\geqslant {{\mathbb {E}}}[W] = r {\mathbb {P}}[A_0] + {\mathbb {P}}[A_+] = 1 - {{\underline{q}}^{-1}_X(0)}. \end{aligned}$$ -
If \(0 \leqslant {{\mathbb {E}}}[X]\) then \(W \equiv 1 \in {{{\mathcal {W}}}}\), hence \({\overline{p}}'(X)= {{\mathbb {E}}}[1]=1\).
In summary, \({\overline{p}}'(X)\geqslant {\overline{p}}^+_X(0)\), where \({\overline{p}}^+_X(x)\) is defined in Definition 3.9 of [6]. By Proposition 3.10 in [6], \({\overline{p}}^+_X(0)=\min _{a \in {\mathbb {R}}^n_+} {\mathbb {E}}[a^\top X + 1]^+\), which finishes the proof of \({\overline{p}}'(X) \geqslant {\overline{p}}(X)\) for \(n=1\).
Now let n be an arbitrary positive integer. By contradiction, suppose that \({\overline{p}}'({{\textbf{X}}}) < {\overline{p}}({{\textbf{X}}})\) and choose any \(k \in ({\overline{p}}'({{\textbf{X}}}), {\overline{p}}({{\textbf{X}}}))\). Let
Then K is a convex set, and \({\overline{p}}'({{\textbf{X}}}) < k\) implies that \({\mathbb {R}}^n_+ \cap K = \emptyset\). By separation hyperplane theorem, this implies that there exists a vector \({\textbf{c}} \in {\mathbb {R}}^n\) and constant b such that (i) \({\textbf{c}}^\top {{\textbf{y}}}\geqslant b\) for all \({{\textbf{y}}}\in {\mathbb {R}}^n_+\) but (ii) \({\textbf{c}}^\top {{\textbf{y}}}< b\) for all \({{\textbf{y}}}\in K\). Condition (i) implies that \({\textbf{c}} \geqslant 0\) and \(b \leqslant 0\). Then (ii) implies that for every W satisfying \(0\leqslant W \leqslant 1\) and \({{\mathbb {E}}}[W]\geqslant k\), we have \(0 \geqslant b > {\textbf{c}}^\top {{\mathbb {E}}}[{{\textbf{X}}}W] = {{\mathbb {E}}}[Y W]\), where \(Y = {\textbf{c}}^\top {{\textbf{X}}}\). But this implies that \({\overline{p}}'(Y) < k\). On the other hand,
Since Y is a one-dimensional random variable, this contradicts to \({\overline{p}}'(X) \geqslant {\overline{p}}(X)\) for the case \(n=1\) proved above. \(\square\)
1.2 B. Proof of proposition 2.3
With notation (48), Proposition 2.3 can be reformulated as
Proposition B.1
On an atomless probability space \(\Omega\), \({\overline{p}}({{\textbf{X}}})\) is the minimal Schur-convex upper bound for \(p({{\textbf{X}}})={\mathbb {P}}[{{\textbf{X}}}\geqslant 0]\).
Proof
For any \({{\textbf{a}}}\in {\mathbb {R}}^n_+\), function \(g({{\textbf{x}}})=[{{\textbf{a}}}^\top {{\textbf{x}}}+ 1]^+\) is a convex function of \({{\textbf{x}}}\), hence \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\) implies that \({\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \geqslant {\mathbb {E}}[{{\textbf{a}}}^\top {{\textbf{Y}}}+ 1]^+\). Taking infimum over all \({{\textbf{a}}}\in {\mathbb {R}}^n_+\), we obtain \({\overline{p}}({{\textbf{X}}}) \geqslant {\overline{p}}({{\textbf{Y}}})\), and consequently, the function \({\overline{p}}({{\textbf{X}}})\) is Schur-convex. Also, for any \({{\textbf{a}}}\in {\mathbb {R}}^n_+\), we have \([{{\textbf{a}}}^\top {{\textbf{X}}}+ 1]^+ \geqslant I_{\{{{\textbf{X}}}\geqslant 0\}}\), so that \({\overline{p}}({{\textbf{X}}}) \geqslant {{\mathbb {E}}}[I_{\{{{\textbf{X}}}\geqslant 0\}}]={\mathbb {P}}[{{\textbf{X}}}\geqslant 0) = p({{\textbf{X}}})\) is indeed an upper bound for \(p({{\textbf{X}}})\). It is left to prove that this upper bound is minimal, i.e., for every \({{\textbf{X}}}\in L^{1,n}(\Omega )\), there exist a \({{\textbf{Y}}}\in L^{1,n}(\Omega )\) such that \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\) and \({\mathbb {P}}[{{\textbf{Y}}}\geqslant 0] \geqslant {\overline{p}}({{\textbf{X}}})\).
Let \({{{\mathcal {W}}}}\) be a feasible set in (49). Consider probability space \(\Omega '=\Omega \times [0,1]\), where [0, 1] is the unit interval with Lebesgue measure. Write elements of \(\Omega '\) as \((\omega ,t)\), where \(\omega \in \Omega\) and \(t\in [0,1]\). Let \(A=\{(\omega ,t) \in \Omega ':W(\omega ) \leqslant t\}\) for arbitrary \(W \in {{{\mathcal {W}}}}\). Then \({\mathbb {P}}'(A)={{\mathbb {E}}}[W]\), where \({\mathbb {P}}'\) is the probability measure on \(\Omega '\). For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\), let \({{\textbf{X}}}' \in L^{1,n}(\Omega ')\) be the random vector defined by \({{\textbf{X}}}'(\omega ,t)={{\textbf{X}}}(\omega )\). Then random vectors \({{\textbf{X}}}\) and \({{\textbf{X}}}'\) have the same distribution. Let \({{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|I_A]\), where \(I_A\) is the indicator function of an event A. Then \(W \in {{{\mathcal {W}}}}\) implies that \({{\mathbb {E}}}[{{\textbf{X}}}_i W] \geqslant 0\), \(i=1,\dots ,n\), which in turn implies that \({{\mathbb {E}}}[X'_i|I_A=1]\geqslant 0\) for each component \(X'_i\) of \({{\textbf{X}}}'\), or, equivalently, that \({{\textbf{Y}}}'(\omega ,t) \geqslant 0\) whenever \((\omega ,t)\in A\). Consequently, \({\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0] \geqslant {\mathbb {P}}'(A) = {{\mathbb {E}}}[W]\). If there exists \({{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}\) such that \({\mathbb {P}}[{{\textbf{Y}}}\geqslant 0]={\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0]\), then, since \(W \in {{{\mathcal {W}}}}\) is arbitrary, this would imply that \(\sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}}{\mathbb {P}}[{{\textbf{Y}}}\geqslant 0] \geqslant \sup _{W \in {{{\mathcal {W}}}}}{{\mathbb {E}}}[W] = {\overline{p}}({{\textbf{X}}})\), where the last equality is (49). Indeed, \({{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|I_A]\) implies that \({{\textbf{X}}}' \geqslant _{cx} {{\textbf{Y}}}'\). Since \(\Omega\) is atomless, there exists a pair of random vectors \({{\textbf{X}}}'' \in L^{1,n}(\Omega )\) and \({{\textbf{Y}}}\in L^{1,n}(\Omega )\) with the same joint distribution as \({{\textbf{X}}}'\) and \({{\textbf{Y}}}'\). Then \({\mathbb {P}}[{{\textbf{Y}}}\geqslant 0]={\mathbb {P}}'[{{\textbf{Y}}}'\geqslant 0]\) and \({{\textbf{X}}}'' \geqslant _{cx} {{\textbf{Y}}}\). Since \({{\textbf{X}}}\) and \({{\textbf{X}}}''\) have the same distribution, \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\), as required. \(\square\)
1.3 C. Proof of proposition 2.5
Proposition 2.5 can be reformulated as follows.
Proposition C.1
Let
Also, let
Then on an atomless probability space,
Proof
First, let us show that (51) is a dual problem to (52): introduce dual variables \(a_i \geqslant 0\) and \(0 \leqslant M \in L^{1}(\Omega )\) for constraints \({\mathbb {E}}[X_i W_i] \geqslant 0\) and \(\sum _{i=1}^n W_i \leqslant 1\), respectively. Minimization of the Lagrangian for (52) with respect to \({{\textbf{a}}}= (a_1,\ldots ,a_n)\) and M yields
where the last equality follows from comparing M to \(m = \max \{a_1 X_1 + 1,\dots ,a_n X_n+1\}\). Indeed, if \(m < 0\) then the subdifferential \(\partial (M + \sum _{i=1}^n [1 + a_i X_i - M]^+) = 1\) for \(M \geqslant 0\), so \(M = 0\) is optimal; otherwise, at \(M = m\), the subdifferential is \(1 + k \cdot [-1, 0] + (n - k) \cdot 0 \ni 0\), for some \(k\in \{1,\dots ,n\}\), so \(M = m\) is optimal.
Now, let \({{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}\) and \({{\textbf{a}}}\in {\mathbb {R}}^n_+\) be arbitrary. The function \(u(X)=[\max \{a_1 X_1,\dots ,a_n X_n\} + 1]^+\) is the maximum of linear functions and therefore is convex. Consequently,
This implies that
We next adapt the proof of Proposition 2.3 to prove that
Indeed, take any \(W_1, \dots , W_n\) in the feasible set of (52). Consider probability space \(\Omega '=\Omega \times [0,1]\), where [0, 1] is the unit interval with Lebesgue measure. Write elements of \(\Omega '\) as \((\omega ,t)\), where \(\omega \in \Omega\) and \(t\in [0,1]\). For every \(j = 1,\dots ,n\), let \(A_j=\{(\omega ,t) \in \Omega ':\sum _{i=1}^{j-1}W_i(\omega ) < t \leqslant \sum _{i=1}^{j}W_i(\omega )\}\). Also, let \(A=\{(\omega ,t) \in \Omega ':\sum _{i=1}^n W_i(\omega ) < t\}\). Then sets \(A_1,\dots ,A_n,A\) form a partition of \(\Omega '\), which we denote \({{{\mathcal {F}}}'}\). Note that \(1-{\mathbb {P}}'(A)={{\mathbb {E}}}[e^\top W]\), where \({\mathbb {P}}'\) is the probability measure on \(\Omega '\) and \(e^\top W=\sum _{i=1}^n W_i\). For any \({{\textbf{X}}}\in L^{1,n}(\Omega )\), let \({{\textbf{X}}}' \in L^{1,n}(\Omega ')\) be the random vector defined by \({{\textbf{X}}}'(\omega ,t)={{\textbf{X}}}(\omega )\). Then random vectors \({{\textbf{X}}}\) and \({{\textbf{X}}}'\) have the same distribution. Let \({{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|{{{\mathcal {F}}}'}]\). Since \(W_1, \dots , W_n\) are in the feasible set of (52), we have \({{\mathbb {E}}}[{{\textbf{X}}}_i W_i] \geqslant 0\), \(i=1,\dots ,n\), which in turn implies that \({{\mathbb {E}}}[{{\textbf{X}}}'_i|I_{A_i}=1]\geqslant 0\) for each component \(X'_i\) of \({{\textbf{X}}}'\), or, equivalently, that \(\max _i Y'_i(\omega ,t) \geqslant 0\) whenever \((\omega ,t)\not \in A\). Consequently, \({\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0] \geqslant 1-{\mathbb {P}}'[A] = {{\mathbb {E}}}[{\textbf{e}}^\top {{\textbf{w}}}]\). If there exists \({{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}\) such that \({\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0]={\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0]\), then, since \(W_1, \dots , W_n\) in the feasible set of (52) are arbitrary, this would imply that \({\overline{F}}_3({{\textbf{X}}}) = \sup _{{{\textbf{Y}}}\leqslant _{cx} {{\textbf{X}}}}{\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0] \geqslant \sup _{W \in {{{\mathcal {W}}}}}{{\mathbb {E}}}[{\textbf{e}}^\top {{\textbf{w}}}] = {\overline{F}}_2({{\textbf{X}}})\). Indeed, \({{\textbf{Y}}}'={{\mathbb {E}}}[{{\textbf{X}}}'|{{{\mathcal {F}}}'}]\) implies that \({{\textbf{X}}}' \geqslant _{cx} {{\textbf{Y}}}'\). Since \(\Omega\) is atomless, there exists a pair of random vectors \({{\textbf{X}}}'' \in L^{1,n}(\Omega )\) and \({{\textbf{Y}}}\in L^{1,n}(\Omega )\) with the same joint distribution as \({{\textbf{X}}}'\) and \({{\textbf{Y}}}'\). Then \({\mathbb {P}}[\max \{Y_1,\dots ,Y_n\}\geqslant 0]={\mathbb {P}}'[\max \{Y'_1,\dots ,Y'_n\}\geqslant 0]\) and \({{\textbf{X}}}'' \geqslant _{cx} {{\textbf{Y}}}\). Since \({{\textbf{X}}}\) and \({{\textbf{X}}}''\) have the same distribution, \({{\textbf{X}}}\geqslant _{cx} {{\textbf{Y}}}\), as required.
Finally, \({\overline{F}}_2({{\textbf{X}}}) \geqslant {\overline{F}}_1({{\textbf{X}}})\) follows from the fact that (51) is the problem dual to (52). \(\square\)
1.4 D. The algorithm for solving problem (28)
Algorithm 1 (bCDF and rCDF in objective).
Step 1. Set a = (1, …1), \(\epsilon\) = small number, I = F or H. |
Step 2. Solve \(\min_{{\mathbf{w}} \in {\mathbf{W}}} \overline{p}_{g1\left( {{\mathbf{a}},{\mathbf{w}}} \right)} \left( 0 \right):{\text{let}}\;\widetilde{{\mathbf{w}}}\) be its optimal solution. |
Step 3. Solve \(\min_{{\mathbf{a}} \in {\mathbf{A}}} \overline{p}_{g1\left( {{\mathbf{a}},\widetilde{{\mathbf{w}}}} \right)} \left( 0 \right):{\text{let}}\;\widetilde{{\mathbf{a}}}\) be its optimal solution. Set \(\widetilde{p} = \overline{p}_{g1\left( {\widetilde{{\mathbf{a}}},\widetilde{{\mathbf{w}}}} \right)} \left( 0 \right).\) |
Step 4. Use \(\widetilde{{\mathbf{w}}}\) as an initial approximation to solve \(\min_{{\mathbf{w}} \in {\mathbf{W}}} \overline{p}_{g1\left( {\widetilde{{\mathbf{a}}},{\mathbf{w}}} \right)} \left( 0 \right):{\text{let}}\) w* be its optimal solution. |
Step 5. Use \(\widetilde{{\mathbf{a}}}\) as an initial approximation to solve \(\min_{{\mathbf{a}} \in {\mathbf{A}}} \overline{p}_{g1\left( {{\mathbf{a}},{\mathbf{w^{*}}}} \right)} \left( 0 \right):\) let a* be its optimal solution. Set \(p^{*} = \overline{p}_{g1\left( {{\mathbf{a^{*}}},{\mathbf{w^{*}}}} \right)} \left( 0 \right).\) |
Step 6. If \(\widetilde{p} - p^{*} < \epsilon ,\) then Stop, otherwise set \(\widetilde{{\mathbf{w}}} = {\mathbf{w^{*},}}\;\widetilde{{\mathbf{a}}} = {\mathbf{a^{*}, and }}\;\widetilde{p} = p^{*}\), and go to Step 4. |
1.5 E. The algorithm for solving problem (37)
Algorithm 2 (bCDF and rCDF in constraint).
Step 1. Set a = (1, …1), \(\epsilon\) = small number. |
Step 2. Solve (37) with respect to w: let \(\widetilde{{\mathbf{w}}}\) be its optimal solution. Set \(\widetilde{v} = V(\widetilde{{\mathbf{w}}}).\) |
Step 3. Solve (38) and (39) with w= \(\widetilde{{\mathbf{w}}}\):let \(\widetilde{{\mathbf{a}}}_{{F}}\) and \(\widetilde{{\mathbf{a}}}_{{H}}\) be optimal solutions of (38) and (39), respectively. |
Step 4. Use \(\widetilde{{\mathbf{w}}}\) as an initial approximation to solve (37) with respect to w with \({\mathbf{a}}_{{F}}\) = \(\widetilde{{\mathbf{a}}}_{{F}}\) and \({\mathbf{a}}_{{H}}\) = \(\widetilde{{\mathbf{a}}}_{{H}}\): let w* be its optimal solution. Set \(v^{*}\) = V(w*). |
Step 5. Use \(\widetilde{{\mathbf{a}}}\) as an initial approximation to solve (38) and (39) with w = w*: let \({\mathbf{a}}_F^{*}\) and \({\mathbf{a}}_H^{*}\) be optimal solutions of (38) and (39), respectively. |
Step 6. If \(\widetilde{v} - v^{*} < \epsilon ,\) then Stop, otherwise set \(\widetilde{{\mathbf{w}}} = {\mathbf{w^{*}}},\;\widetilde{{\mathbf{a}}}_F = {\mathbf{a}}_F^{*} {\text{, and }}\widetilde{{\mathbf{a}}}_H = {\mathbf{a}}_H^{*} \;{\text{and}}\;\widetilde{v} = v^{*}\), and go to Step 4. |
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Grechuk, B., Zabarankin, M., Mafusalov, A. et al. Buffered and Reduced Multidimensional Distribution Functions and Their Application in Optimization. Optim Lett 18, 403–426 (2024). https://doi.org/10.1007/s11590-023-02045-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-023-02045-1