Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control

Weston, Jerome; Tolić, Domagoj; Palunko, Ivana

doi:10.1007/s10957-023-02364-4

Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control

Open access
Published: 09 January 2024

Volume 200, pages 437–462, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control

Download PDF

Jerome Weston^1,2,
Domagoj Tolić^1,3 &
Ivana Palunko^1,2

1045 Accesses
Explore all metrics

Abstract

This article applies novel results for infinite- and finite-horizon optimal control problems with nonlinear dynamics and constraints. We use the Valentine transformation to convert a constrained optimal control problem into an unconstrained one and show uniqueness of the value function to the corresponding Hamilton–Jacobi–Bellman (HJB) equation. From there, we show how to approximate the solution of the initial (in)finite-horizon problem with a family of solutions that is $\varGamma $-convergent. Optimal solutions are efficiently obtained via a solver based on Pontryagin’s Principle (PP). The proposed methodology is demonstrated on the path planning problem using the full nonlinear dynamics of an unmanned aerial vehicle (UAV) and autonomous underwater vehicle (AUV) involving state constraints in 3D environments with obstacles.

UAV Path Planning Using Optimization Approaches: A Survey

Article 18 April 2022

A Real-time Fuzzy Interacting Multiple-Model Velocity Obstacle Avoidance Approach for Unmanned Aerial Vehicles

Article Open access 15 April 2024

Intelligent fractional-order sliding mode control based maneuvering of an autonomous vehicle

Article 17 April 2024

1 Introduction

Building upon the recent existence and uniqueness results regarding the Hamilton–Jacobi–Bellman (HJB) equation [2], this article proposes a novel framework for solving a general class of infinite- and finite-horizon optimal control problems with constraints. The class of optimal problems considered herein includes dynamics of the form ${\dot{x}} = a(t,x) + b(t,x)u$ (additional structure mentioned in Sect. 3) and performance indices with locally bounded, Lipschitz and convex integral functions. Elliptic, and therefore convex, state constraints are considered and handled via the Valentine transformation. In addition, the admissible control signals belong to time-varying convex sets.

We utilize the off-the-shelf software package DIDO [16, 20], built upon the pseudo-spectral analysis and Pontryagin’s principle (PP), to numerically obtain trajectories/controls starting from various initial positions. Since DIDO outputs at most one trajectory/control per computation, in order to ensure the uniqueness of trajectory/control for a given problem, we impose a smoothness condition on our model. In a nutshell, given a constrained optimal control problem, we extend earlier HJB results devised for unconstrained problems and make a connection with PP to facilitate the application of PP-based solvers.

The proposed framework revolves around infinite-horizon Bolza-type cost functions with running costs exponentially decaying in time. This class of integral functions leads to existence and uniqueness of the HJB solutions [2], and mitigates the HJB solution-seeking process. We show $\varGamma $-convergence of solutions with these cost functions to the solutions of the original constrained problems whose running costs do not have to decay exponentially in time. Essentially, we demonstrate how to employ our infinite-horizon framework to approximate the original constrained optimal control problems.

Prevalent approaches toward solving HJB are found in, among others, [1, 2, 7, 13, 15]. Owing to recent advances in computational power, a large portion of nowadays approaches are based on Dynamic Programming (DP) such as Approximate Dynamic Programming (ADP) and Reinforcement Learning (RL) [6]. ADP refers to model-based approaches while RL refers to model-free approaches. It is to be noted that the approach delineated herein is model-based. However, ADP and RL do not deal well with constraints [13], but so-called soft constraints are typically imposed through penalty terms in cost functions, which is yet another incentive behind our framework. A somewhat more mature and more akin to our approach is Model Predictive Control (MPC) [7]. However, optimality of MPC solutions is typically traded for computational efficiency and stability guarantees. The present work aims for maintaining optimality when resorting to numeric solutions of HJB.

This article is a comprehensive version of [23] and complements it in several directions. In particular, (a) dynamics for which the control uniqueness cannot be guaranteed (but merely the value-function uniqueness) are taken into account (b) along with corresponding numerical examples involving autonomous robots (e.g., the AUV and UAV), and (c) all mathematical proofs and pertaining discussions omitted from [23] are enclosed here.

The remainder of this article is organized as follows. Section 2 introduces the utilized notation and definitions. The infinite- and finite-horizon constrained optimal control problems considered herein are formulated in Sect. 3. Section 4 proposes a methodology to successfully solve the problems of interest. Section 5 establishes relations between solutions obtained in Sect. 4 and the original infinite- and finite-horizon problems from Sect. 3. Section 6 provides linear and nonlinear numerical examples. The conclusions and future research avenues are in Sect. 7. Appendix brings several technical results aiding in article self-containment.

2 Preliminaries

Going forward, we use the following conventions. We denote by $|\cdot |$, $|\cdot |_{\infty }$, $|\cdot |_{m}$, and $\langle \cdot ,\cdot \rangle $ the Euclidean norm, the max norm, the matrix norm induced by the Euclidean norm, and the inner product in $\mathbb {R}^{k}$, respectively. For a nonempty set $C \subset \mathbb {R}^{k}$, we denote its closure by ${\overline{C}}$ and its boundary by $\partial C$. For a matrix M, we denote its Moore-Penrose (left) inverse by $M^{\dag }$.

For row vectors we use $r = [r_{1}\; \ldots \; r_{N_{x}}]$ and for column vectors we use $r = [r_{1};\; \ldots ;\; r_{N_{x}}]$, where $r_i$’s are scalars. When the argument of a function $f:\mathbb {R}\rightarrow \mathbb {R}$ is a row vector r, it means $f(r) = [f(r_{1})\; \ldots \; f(r_{N_{x}})]$. Likewise, for a column vector r, we have $f(r) = [f(r_{1});\; \ldots ;\; f(r_{N_{x}})]$ and I(r) be a diagonal matrix where the diagonal entries are given by r. Unless stated otherwise, vectors are column vectors by default.

Define the matrix ${\hat{x}}\in SO(3)$ such that for $x\in \mathbb {R}^{3}$, we have

$$\begin{aligned} {\hat{x}} = \begin{bmatrix} 0 &{} x_3 &{} -x_2\\ -x_3 &{} 0 &{} x_1\\ x_2 &{} -x_1 &{} 0\\ \end{bmatrix}. \end{aligned}$$

Let $S(x)\in \mathbb {R}^{9\times 9}$, where $x\in \mathbb {R}^{3}$, be a block diagonal matrix with main diagonal matrices ${\hat{x}}$. For $x \in \mathbb {R}^{9}$, define $[x]_{3} \in \mathbb {R}^{3\times 3}$ as

$$\begin{aligned}{}[x]_{3} = \begin{bmatrix} x_1 &{} x_2 &{} x_3\\ x_4 &{} x_5 &{} x_6\\ x_7 &{} x_8 &{} x_9\\ \end{bmatrix}. \end{aligned}$$

Let I and J be closed intervals in $\mathbb {R}$. Let $\mu $ be the Lebesgue measure. Denote by $L^{1}(I;\;J)$ the set of all J-valued Lebesgue integrable functions on I. We say that $f \in L^{1}_{loc}(I;\;J)$ if $f \in L^{1}(K;\;J)$ for any compact subset $K \subset I$. We denote by ${\mathcal {L}}_{loc}$ the set of all functions $f \in L^{1}_{loc}([0,\infty );\;\mathbb {R}^{+})$ such that $\lim _{\sigma \rightarrow 0}\sup \limits \{\int _{J}f(\tau )\,d\tau \;: J\subset [0,\infty ),\; \mu (J)\le \sigma \} = 0$. We say that a function is $C^{1}$ if it is continuous and has continuous first derivatives in all variables. For a function f, denote $f^{(l)}$, $l\in \mathbb {N}$, as the $l^{\text {th}}$ time derivative and $\{f^{(l)}\}$ denote the finite sequence of time derivatives of f from one to some order. Suppose g is a function such that its evaluation can be related to a sequence, i.e., a series or a max/min function. Let $g(\{s_l\},K, v)$ be the corresponding value, where $\{s_l\}$ is the given sequence, K is the length of the sequence, and v is a vector whose components represent scalars that may arise for the computation of $g(\{s_l\},K, v)$. Note that the lengths of $\{s_l\}$ and v may be different.

Definition 2.1

A set-valued function $G: \mathbb {R}^{k} \rightsquigarrow \mathbb {R}^{n}$ is lower semicontinuous if, for all $x \in \mathbb {R}^{k}$, $\text {Lim} \inf \limits _{y\rightarrow x}G(y)\subset G(x)$.

Finally, we introduce a form of convergence for functionals.

Definition 2.2

Given a family of functionals $\{F_{n}: X\times U\rightarrow \mathbb {R} \}$ and a functional $F: X\times U\rightarrow \mathbb {R}$, we say that $F_{n}$ $\varGamma $-converges to F (denoted $F_{n}\xrightarrow {\varGamma } F$) if the following hold:

1.
For every $(x,u)\in X\times U$, there exists a sequence $(x_{n}, u_{n})$ such that, as $n \rightarrow +\infty $, $F(x,u)\ge \limsup F_{n}(x_{n},u_{n})$ and $(x_{n}, u_{n})\rightarrow (x,u)$ “in some sense". (This will be made precise later on.)
2.
For every sequence $(x_{n}, u_{n})$ and (x, u) such that $(x_{n}, u_{n})\rightarrow (x,u)$ “in some sense", as $n \rightarrow +\infty $, we have $F(x,u)\le \liminf F_{n}(x_{n},u_{n})$.

2.1 Motivations for Using DIDO

To establish a connection between PP and HJB, we consider the method of characteristics for first order nonlinear PDEs. For first order PDEs of the form

$$\begin{aligned} \begin{array}{rcl} F(x,V,V_{x}) = 0 &{},&{} x\in \varOmega \subset \mathbb {R}^{N_{x}},\\ V(x) = {\overline{V}}(x)&{},&{} x\in \partial \varOmega ,\\ \end{array} \end{aligned}$$

(1)

with a continuous function $F: \mathbb {R}^{N_{x}}\times \mathbb {R}\times \mathbb {R}^{N_{x}}\rightarrow \mathbb {R}$ and endpoint data ${\overline{V}}(x)$, one can construct a solution V by solving the characteristic ODEs (substituting $p = V_{x}$)

$$\begin{aligned} \begin{array}{rcl} {\dot{x}}_{i} = F_{p_{i}}, &{}&{} i = 1,\ldots ,n,\\ {\dot{V}} = \sum \limits _{i}p_{i} F_{p_{i}},&{}&{}\\ {\dot{p}}_{i} = -F_{x_{i}} -F_{V}p_{i}, &{}&{}\\ \end{array} \end{aligned}$$

(2)

with $x(0) = y,\; V(0) = V(y),\; p(0) = V_{x}(y)$ for $y\in \partial \varOmega $. It can be shown ([15, Section 7.2], [5, Chapter 8]) that the characteristic ODEs for HJB are the canonical ODEs for PP. It should be noted that the underlying derivation is based on sufficient smoothness of all functions involved—including V being $C^{1}$. Fortunately, characteristic ODEs can still be used with weaker assumptions on V ([15, Section 7.2], [5, Chapter 8]).

A side motivation to connect PP and HJB is the following. Controls generated by PP are, in general, open-loop, that is, $u = u(t)$. Controls given by HJB, in contrast, are closed-loop, that is, $u = u(t,x)$. Consequently, the concept of real-time optimal control (RTOC) can be used to transform an open-loop control, such as those constructed by DIDO, into a closed-loop control. The idea behind RTOC is to assume that an open-loop control exists beforehand and then to account for nonzero computational time of the optimal trajectory (i.e., updates based on given data). Methods that could be used to transform open-loop controls generated by DIDO into closed-loop controls are found in [19, 21].

3 Problem Formulation

Before stating the problem, let us define our state space for ease of notation. Let $\mathbb {X}_{s}\subset \mathbb {R}^{N_{s}}$, with $N_{s} \in \{2,\; 3\}$, be the obstacle space and $\mathbb {X}_{ns}\subset \mathbb {R}^{k*N_{s}}$, with $k \in \mathbb {N}\cup \{0\}$, be the free space; for $k=0$, one has $\mathbb {X}_{ns} = \emptyset $. Let $\mathbb {X} = \mathbb {X}_{s}\times \mathbb {X}_{ns}$ with both $\mathbb {X}_{s}$ and $\mathbb {X}_{ns}$ as open, connected, and bounded sets whose cluster points fall within $\mathbb {X}$ or on $\partial {\mathbb {X}}$. The dimension of $\mathbb {X}$ is equal to $N_{x} = (k+1)N_{s}$. Here $\mathbb {X}_{s}$ represents the spatial region in which a mobile vehicle moves – typically in 2- or 3-dimensions. $\mathbb {X}_{ns}$ represents the space in which all other variables (i.e., velocities, rotations, etc.) are active. For vector $x_{ns}(t) = [x_{ns1}(t); \;\ldots ; x_{nsk}(t)]\in \mathbb {X}_{ns}$, components $x_{nsi}(t)$ are of dimension $N_{s}\times 1$. We also take the functions $t\mapsto u(t)$ to be measurable, admissible, and taking values in $U(t)\subset \mathbb {R}^{N_{u}}$ with U(t) bounded for all $t\in [t_0, t_f]$.

We consider the following optimal control problem:

$$\begin{aligned} \begin{array}{rclr} x(t)=[x_{s}(t); x_{ns}(t)]\in \mathbb {X} &{},&{} u(t) \in U(t) \subset \mathbb {R}^{N_{u}},\\ \inf \limits _{(x,u)}\; J(t_0,t_f,x_f,x,u) &{}=&{} E(x(t_{f}), x_{f}) + \int ^{t_f}_{t_0}l(s,x(s),u(s))ds \end{array} \end{aligned}$$

(3)

subject to

$$\begin{aligned} \begin{array}{rclr} {\dot{x}} &{}=&{} a(t,x)+b(t,x)u,\\ h^{L}&{}\le &{} h(x_{s}), \\ x(t_{0}) &{}=&{} x_{0}, \end{array} \end{aligned}$$

where $a: [t_{0},t_{f}]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1)N_{s}}$ and $b:[t_{0}, t_{f}]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1)N_{s}\times N_{u}}$ are defined as follows for $x = [x_{s};\; x_{ns1}; \;\ldots ; x_{nsk}]$ and some $K_{o}\in \mathbb {N}$:

$$\begin{aligned} a(t,x) = \begin{bmatrix} x_{ns1}\\ x_{ns2}\\ \vdots \\ x_{ns(K_{o}-1)}\\ a_{K_{o}}(t, x)\\ a_{e}(t, x) \end{bmatrix},\; b(t,x) = \begin{bmatrix} {\textbf{0}}_{N_{s}\times N_{u}}\\ {\textbf{0}}_{N_{s}\times N_{u}}\\ \vdots \\ {\textbf{0}}_{N_{s}\times N_{u}}\\ b_{K_{o}}(t, x)\\ b_{e}(t, x) \end{bmatrix}, \end{aligned}$$

(4)

with functions $a_{K_{o}}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{N_{s}},\; a_{e}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1-K_{o})*N_{s}}, \; b_{K_{o}}:[t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{N_{s}\times N_{u}},\text { and } b_{e}: [t_0, t_f]\times \mathbb {X}\rightarrow \mathbb {R}^{(k+1-K_{o})*N_{s}\times N_{u}}$ bounded, Lipschitz continuous, and componentwise convex in $x\in \mathbb {X}$ for almost all $t\ge t_{0}$. Also, we assume the dynamical system ${\dot{x}} = a(t,x)+b(t,x)u$ is controllable in $\mathbb {X}$ while satisfying all state constraints (the role and actual value of $K_{o}$ will be made clear in Sect. 4.1). It should be noted that many autonomous system models, such as AUV and UAV, follow this structure. Furthermore, U(t) is a compact and convex subset of $\mathbb {R}^{N_{u}}$ for each t.

Define $h: \mathbb {X}_{s} \rightarrow \mathbb {R}^{n_{o}}$ componentwise as

$$\begin{aligned} {h}_{i} (x_{s}) =\sum \limits ^{N_{s}}_{j = 1} \bigg (\frac{x_{sj}(t) - {\bar{x}}_{ij}}{r_{ij}}\bigg )^{2}, \end{aligned}$$

for $i = 1,\ldots ,n_{o}$, where ${\bar{x}}_{i} = ({\bar{x}}_{i1},\; {\bar{x}}_{i2},\;\ldots ,\; {\bar{x}}_{iN_{s}})$, $r_{i} = (r_{i1},\; r_{i2},\;\ldots ,\; r_{iN_{s}})$ represents the center and principal axis lengths of an ellipsoid, respectively, and $n_{o}$ represents the number of state constraints (that is, obstacles) in our model with $n_{o}\ge N_{s}$. The case for $0< n_o < N_s$ will appear in future works. We define $h^{L}$ componentwise as $h^{L}_{i} = ({\hat{r}}_{i}+d_{i})^{2}$, where ${\hat{r}}_{i}$ relates to principle axis lengths of a given obstacle (i.e., if the obstacle is a sphere, ${\hat{r}}_i$ is just the radius) and $d_{i} > 0$ represents a buffer distance for avoiding obstacles (for example, when mobile agents are represented merely as points, $d_{i}$ takes into account their actual dimensions so as to avoid collisions). Without loss of generality, it is also assumed that the obstacles are completely contained in $\mathbb {X}_{s}$ (just cut out the partially contained obstacles from $\mathbb {X}_{s}$). For further clarification, the inequality constraint $h^{L}\le h(x_s)$ should be interpreted componentwise as $h^{L}_{i}\le h_{i}(x_s)$ for $i = 1,\ldots ,n_o$. At this point we make an additional assumption on the vector $b_{K_o}(t,x)$, i.e., the vector $\{\nabla _{x}h(x_s)b_{K_o}(t,x)\}^{\dag }$, is convex and Lipschitz continuous in x.

Let l(t, x, u) be nonnegative, Lipschitz in x, uniformly continuous in u, lower semicontinuous in t, bounded for all $t \ge t_0$, strictly convex in u, and coercive and bounded from below in x and u. Note that since $\mathbb {X}$ and U(t) are compact, l(t, x, u) is bounded for any feasible trajectory-control pair. The terminal cost $E(\cdot )$ for finite-horizon problems is typically positive definite in $x(t_f)$ with respect to $x_f$ whereas it is absent in infinite-horizon problems. Clearly, we allow $t_f$ to be finite or infinite. In what follows, we seek to approximate a solution to problem (3) in a reasonable way.

4 Methodology

In order to approximate a solution to (3), we exploit the following auxiliary infinite-horizon optimal control problem:

$$\begin{aligned} \begin{array}{rclr} x(t) \in \mathbb {X} &{},&{} u(t) \in U(t),\\ \inf \limits _{(x,u)}\; J_{\infty }(t_0,x,u,\lambda ) &{}=&{} \int ^{\infty }_{t_0} F_{\lambda }(s,x(s),u(s))ds\\ \text {subject to }\\ {\dot{x}} &{}=&{} a(t,x) + b(t,x)u,\\ h^{L}&{}\le &{} h(x_{s}),\\ x(t_{0}) &{}=&{} x_{0},\\ \end{array} \end{aligned}$$

(5)

for $\lambda > 0$ where $F_{\lambda }(t,x,u) = l(t,x,u)e^{-\lambda t}$.

4.1 Application to State-Constrained Problem

One can transform (5) into an unconstrained problem using the Valentine transformation [18, 22]. Consider a slack function $\alpha : [t_0, \infty )\rightarrow \mathbb {R}^{n_{o}}$ such that

$$\begin{aligned} {\tilde{h}}(x(t)) + \frac{1}{2}(\alpha (t))^{2} = 0, \end{aligned}$$

(6)

where ${\tilde{h}}(x) = h_{L} - h(x)$ and $\frac{1}{2}(\alpha (t))^{2} = \frac{1}{2}[\alpha _{1}(t)^{2},\ldots ,\alpha _{n_{o}}(t)^{2}]^{\top }$. In [23], the differential equation

$$\begin{aligned} \frac{d}{dt}\Big ({\tilde{h}}(x(t)) + \frac{1}{2}(\alpha (t))^{2}\Big ) = 0, \end{aligned}$$

(7)

is rewritten as

$$\begin{aligned} {\nabla {\tilde{h}}(x(t))}\big (a(t,x)+b(t,x)u\big ) + I(\alpha (t)){\dot{\alpha }}(t)= 0, \end{aligned}$$

(8)

where $I(\alpha )$ is an $n_{o}\times n_{o}$ diagonal matrix with the main diagonal equal to $\alpha $. By the implicit function theorem, [23] writes the control u from (8) as $u = \varPhi (t,x(t),\alpha (t),{\dot{\alpha }}(t))$ and defines a new control ${\tilde{u}} ={\dot{\alpha }}$. This derivation is possible in [23] due to $K_{o} = 1$. In general, this might not be the case. For example, consider the quadrotor example in Sect. 6.3 with one obstacle (for simplicity). Here, with $x_{s} = [x;\; y;\; z] \in \mathbb {R}^{3}$, we have the following case:

$$\begin{aligned} \begin{array}{rl} \frac{d}{dt}({\tilde{h}}(x_{s}(t)) )&{} = \frac{d}{dt} \big (h_{L} - \big (\frac{x(t) - {\bar{x}}}{r_{11}}\big )^{2}+\big (\frac{y(t) - {\bar{y}}}{r_{12}}\big )^{2}+\big (\frac{z(t) - {\bar{z}}}{r_{13}}\big )^{2} \big ) \\ &{}= -2\bigg [\frac{x(t) - {\bar{x}}}{r_{11}}\; \frac{y(t) - {\bar{y}}}{r_{12}}\; \frac{z(t) - {\bar{z}}}{r_{13}}\bigg ]\bigg [ \frac{{\dot{x}}(t)}{r_{11}}; \frac{{\dot{y}}(t)}{r_{12}};\frac{{\dot{z}}(t)}{r_{13}}\bigg ]\\ &{}= -2\bigg [\frac{x(t) - {\bar{x}}}{r_{11}}\; \frac{y(t) - {\bar{y}}}{r_{12}}\;\frac{z(t) - {\bar{z}}}{r_{13}}\bigg ]*I(r_{-1;1})x_{ns1}, \end{array} \end{aligned}$$

where $I({r}_{-1;1})=\text {diag}\{r_{11}^{-1}; \ldots ; r_{1N_{x}}^{-1}\}$. We note that the time derivative of ${\tilde{h}}$ is not an explicit function of the control (given in the example as $\tau $, which is the notation more common in the UAV community). In fact, getting an expression similar to (8) for the quadrotor example requires the second time derivative of (6). For a general application of the Valentine transformation, one differentiates (6) until the control variable explicitly appears on the left hand side, that is,

$$\begin{aligned} {\tilde{h}}^{(K_{o})}(x(t)) + \{\text {terms involving }\alpha ^{(1)}(t),\ldots ,\alpha ^{(K_{o}-1)}(t)\}+\alpha (t)\alpha ^{(K_{o})}(t) = 0, \end{aligned}$$

(9)

with $K_{o}$ as the order of differentiation for which this happens (for the first time) and such that $\partial {\tilde{h}}^{(K_{o})}/\partial u $ (the partial derivative of ${\tilde{h}}^{(K_{o})}$ with respect to u) is nonzero. From the definition of $K_{o}$, we can replace ${\tilde{h}}^{(K_{o})}(x(t))$ in (9) with ${\tilde{h}}^{(K_{o})}(x(t),u(t))$. For a more detailed derivation, we first show the case with $n_{o} = 1$ and then follow with the general case of $n_{o}$ obstacles. This is possible due to the fact that the number of obstacles involved is of no concern to the application of the Valentine transformation (the condition of $n_o \ge N_{s}$ is necessary for convexity conditions in the proof of Theorem 4.1). First, consider the term ${\tilde{h}}^{(K_{o})}(x, u)$, which is in fact reduced to ${\tilde{h}}^{(K_{o})}(x, u) = -h^{(K_{o})}(x, u)$.

Next,

$$\begin{aligned} \begin{array}{rl} -h^{(K_{o})}(x, u) &{}=- \sum \limits ^{N_{s}}_{j=1}\frac{d^{K_{o}}}{dt^{K_{o}}}\Bigg (\bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg ) \bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )\Bigg )\\ &{} = - \sum \limits ^{N_{s}}_{j=1}\Bigg (\sum \limits ^{K_{o}}_{l=0} {K_{o}\atopwithdelims ()l} \bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(K_{o}-l)}\bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(l)}\Bigg )\text { (General Leibniz Rule)}\\ &{} = -\sum \limits ^{N_{s}}_{j=1}\Bigg (\sum \limits ^{K_{o}-1}_{l=1} {K_{o}\atopwithdelims ()l} \bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(K_{o}-l)}\bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(l)} + 2 \bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )\frac{x^{(K_{o})}_{sj}}{r_{1j}} \Bigg )\\ &{} = -2\sum \limits ^{N_{s}}_{j=1} \bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )\frac{x^{(K_{o})}_{sj}}{r_{1j}} - \sum \limits ^{K_{o}-1}_{l=1} {K_{o}\atopwithdelims ()l}\sum \limits ^{N_{s}}_{j=1}\bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(K_{o}-l)}\bigg (\frac{x_{sj}-{\bar{x}}_{j}}{r_{1j}}\bigg )^{(l)}\\ &{} =-\nabla h(x_{s})(a_{K_{o}}(t,x) +b_{K_{o}}(t,x)u) - G_{s}(\{x^{(l)}_{s}\}; K_{o},\; r_{1}),\\ \end{array} \end{aligned}$$

(10)

where

$$\begin{aligned}{} & {} G_{s}(\{x^{(l)}_{s}\}; K_{o},\; {r}_{1}) \\{} & {} \quad = {\left\{ \begin{array}{ll} 2\sum \limits ^{\frac{1}{2}(K_{o}-1)}_{l=1}{K_{o}\atopwithdelims ()l}{\Big \langle x^{(K_{o}-l)}_{s},\; I^{2}(r_{-1;1})x^{(l)}_{s}\Big \rangle },&{} K_{o}\, \text {is odd,}\\ {K_{o}\atopwithdelims ()\frac{1}{2}K_{o}}{\Big \langle x^{(\frac{1}{2}K_{o})}_{s},\; I^{2}(r_{-1;1})x^{(\frac{1}{2}K_{o})}_{s}\Big \rangle }+2\sum \limits ^{\frac{1}{2}K_{o}-1}_{l=1}{K_{o}\atopwithdelims ()l}{\Big \langle x^{(K_{o}-l)}_{s},\; I^{2}(r_{-1;1})x^{(l)}_{s}\Big \rangle },&{} K_{o}\, \text {is even}. \end{array}\right. } \end{aligned}$$

Similarly,

$$\begin{aligned} \frac{d^{K_{o}}}{dt^{K_{o}}}\left( \frac{1}{2}\alpha (t)^{2}\right) = \alpha (t)\alpha ^{(K_{o})}(t) + G_{\alpha }\left( \{\alpha ^{(l)}\}; K_{o}\right) \end{aligned}$$

(11)

with

$$\begin{aligned} G_{\alpha }(\{\alpha ^{(l)}\}; K_{o}) = {\left\{ \begin{array}{ll} \sum \limits ^{\frac{1}{2}(K_{o}-1)}_{l=1}{K_{o}\atopwithdelims ()l}\alpha ^{(K_{o}-l)}\alpha ^{(l)},&{} K_{o} \, \text {is odd,}\\ \frac{1}{2}{K_{o}\atopwithdelims ()\frac{1}{2}K_{o}}(\alpha ^{(\frac{1}{2}K_{o})})^{2}+\sum \limits ^{\frac{1}{2}K_{o}-1}_{l=1}{K_{o}\atopwithdelims ()l}\alpha ^{(K_{o}-l)}\alpha ^{(l)},&{} K_{o}\, \text {is even,} \end{array}\right. } \end{aligned}$$

and combining (10) and (11) gives us

$$\begin{aligned}{} & {} -\nabla h(x_{s})(a_{K_{o}}(t,x) +b_{K_{o}}(t,x)u) - G_{s}(\{x^{(l)}_{s}\}; K_{o},\; r_{1})\nonumber \\{} & {} \quad + G_{\alpha }(\{\alpha ^{(l)}\}; K_{o}) + \alpha (t)\alpha ^{(K_{o})}(t) = 0. \end{aligned}$$

(12)

For general $n_{o}$, we define the following matrices for $t\ge t_{0}$ and $i \in \{1,\ldots ,n_{o}\}$:

$\alpha ^{(l)} \in \mathbb {R}^{n_{o}}$ such that $\alpha ^{(l)} = [\alpha ^{(l)}_{1};\ldots ;\alpha ^{(l)}_{n_{o}}]$, $\alpha ^{0} = \alpha $, and $\overrightarrow{\alpha } = [\alpha ; \alpha ^{(1)};\ldots ;\alpha ^{(K_{o}-1)}]$,
$D(x_{s}(t))\in \mathbb {R}^{n_{o}\times N_{s}}$ such that
$$\begin{aligned} D_{ij}(x_{s}(t)) = 2\left( \frac{x_{sj}(t)-{\bar{x}}_{ij}}{r^{2}_{ij}}\right) = {\partial _{x_{sj}}h_{i}(x_{s}(t))}, \end{aligned}$$
$\overrightarrow{G}_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r})\in \mathbb {R}^{n_{o}}$ where
$$\begin{aligned} \overrightarrow{G}_{s,i}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r}) = G_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r_{i}}), \end{aligned}$$
$\overrightarrow{G}_{\alpha }({\overrightarrow{\alpha }}; K_{o}) \in \mathbb {R}^{n_{o}}$ where
$$\begin{aligned} \overrightarrow{G}_{\alpha ,i}({\overrightarrow{\alpha }}; K_{o}) = G_{\alpha }({\{\alpha ^{(l)}_{i}\}^{K_{o}-1}_{l = 1}}; K_{o}). \end{aligned}$$

As in [23], we express $u = \varPhi (t, x, {\overrightarrow{\alpha }}, {\tilde{u}})$ with new control ${\tilde{u}} = \alpha ^{(K_{o})}$ and derive $\varPhi $ from the generalized version of (12) as

$$\begin{aligned} \begin{array}{rl} u&{}=\{D(x_{s})b_{K_{o}}(t,x)\}^{\dag }\\ &{}\{-D(x_{s})a_{K_{o}}(t,x) - \overrightarrow{G}_{s}(\{x^{(l)}_{s}\}; K_{o}) + \overrightarrow{G}_{\alpha }({\overrightarrow{\alpha }}; K_{o}) + I(\alpha ){\tilde{u}} \}. \end{array} \end{aligned}$$

(13)

With notation $\overrightarrow{\alpha } = [\alpha ; \beta _{1};\ldots ;\beta _{K_{o}-1}]$, we reach the following unconstrained problem

$$\begin{aligned} \begin{array}{c} (x(t),\overrightarrow{\alpha }(t)) \in \mathbb {X}\times \mathbb {A},\;\; {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\subset \mathbb {R}^{n_{o}},\\ \inf \limits _{(x,{\tilde{u}})}\;{\tilde{J}}_{\infty }(t_{0},x,{\tilde{u}},\lambda )=\!\int ^{\infty }_{t_0}{\tilde{F}}_{\lambda }(s,x(s),\overrightarrow{\alpha }(s),{\tilde{u}}(s))ds\\ \end{array} \end{aligned}$$

(14)

such that

$$\begin{aligned} \begin{array}{rl} {\dot{x}}&{} = a(t,x)+b(t,x)\varPhi (t,x,\overrightarrow{\alpha },{\tilde{u}}), \\ {\dot{\alpha }}&{} = \beta _{1},\\ {\dot{\beta }}_{1}&{} = \beta _{2},\\ \vdots &{}\\ {\dot{\beta }}_{K_{o}-1}&{} = {\tilde{u}},\\ x(t_{0})&{}= x_{0},\\ \alpha (t_{0})&{}= \sqrt{-2{\tilde{h}}(x_{0})},\\ \beta _{1}(t_0)&{} = -\dot{{\tilde{h}}}(x_{0})/\alpha (t_{0}),\\ \beta _{2}(t_0)&{} = - (\beta ^{2}_{1}(t_0) + \ddot{{\tilde{h}}}(x_{0}))/\alpha (t_0),\\ \vdots &{}\\ \end{array} \end{aligned}$$

(15)

where $\mathbb {A}\subset \mathbb {R}^{K_{o}n_{o}}$ and ${\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }, {\tilde{u}}):= F_{\lambda }(t,x,\varPhi (t, x, \overrightarrow{\alpha },{\tilde{u}}))$. Here we note that $\alpha (t_0) \ne 0$ and that the expressions for $\alpha ,\; \beta _{1},\; \beta _{2},$ and the like are interpreted componentwise. We also define the corresponding Hamiltonian

$$\begin{aligned}{} & {} {\tilde{H}}(t,x,\overrightarrow{\alpha }, {\tilde{u}},p)\nonumber \\{} & {} \quad := {\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }, {\tilde{u}})+ \langle p, \big [{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}); [\beta _{1};\ldots ;\;\beta _{K_{o}-1};\;{\tilde{u}}] \big ]\rangle . \end{aligned}$$

(16)

From here, the following theorem can be applied with ${\tilde{F}}_{\lambda }$ and ${\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}):= a(t,x)+b(t,x)$ $\varPhi (t,x,\overrightarrow{\alpha },{\tilde{u}})$ in place of $F_{\lambda }$ and $f(t,x,u):=a(t,x)+b(t,x)u$, respectively. Observe that in both necessity and sufficiency, the existence of a global minimizer ${\tilde{u}}$ for ${\tilde{H}}$ is a condition of interest. In general, while existence can be established through some effort, uniqueness is not guaranteed.

Remark 4.1

It should be noted that above equations (6), (8), (9), and (12) hold along the trajectories with further details held in [18, 22]. Also, observe that for l, a and b satisfying the assumptions from Sect. 3, the function ${\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}}) = l(t,x,\varPhi (t,x,\overrightarrow{\alpha },{\tilde{u}}))$ is convex and Lipschitz in $(x,\overrightarrow{\alpha })$ and strictly convex in ${\tilde{u}}$. Consequently, ${\tilde{F}}_{\lambda }$ inherits the favorable properties of $F_{\lambda }$.

Remark 4.2

It is possible that a symmetric obstacle configuration exists such that multiple optimal control/trajectory pairs could exist for a particular example of (5). We note that although the map $\varPhi $ is affine in ${\tilde{u}}$, the Valentine transformation that changes (5) into (14)–(15) can allow for the multiple optimal pairs possible in a given example of (5) to be represented by the unique optimal pair for corresponding (14)–(15) via restrictions, projections, etc. (Further details will be included in the upcoming work.)

4.2 Weak Solutions to HJB

We now present the main theorem of this article, which establishes a unique weak solution of the HJB equation in the class of lower semicontinuous functions vanishing at infinity. For our purposes, the HJB equation is given as follows

$$\begin{aligned} {\left\{ \begin{array}{ll} -\partial _{t}V(t,x,\overrightarrow{\alpha }) + \sup \limits _{{\tilde{u}}\in {\tilde{U}}}{\tilde{H}}(t,x,\overrightarrow{\alpha }, {\tilde{u}},-\nabla _{x,\overrightarrow{\alpha }}V(t,x,\overrightarrow{\alpha })) = 0, &{} \text {on } (0,\infty )\times \mathbb {X}\times \mathbb {A},\\ \lim \limits _{t\rightarrow \infty }\sup _{(y,\overrightarrow{\beta })\in \text {dom }V(t,\cdot ,\cdot )}|V(t,y,\overrightarrow{\beta })| = 0. \end{array}\right. } \end{aligned}$$

(17)

Theorem 4.1

For the HJB equation derived from (14)–(15), there exists a unique viscosity solution ${\hat{V}}$ vanishing at infinity. In addition, if ${\tilde{H}}$ is continuous and convex in ${\tilde{u}}$, ${\tilde{U}}(t,x,\overrightarrow{\alpha })$ is a compact and convex set and $K_{o} = 1$, then there exists a unique control ${\hat{u}}$.

Proof

In this proof, we first establish uniqueness of ${\hat{V}}$ by showing that (14)–(15) satisfies a set of assumptions and by using a proof in the style of Proposition 2 from the Appendix. Afterward, we establish existence and uniqueness of the control ${\hat{u}}$ based on the properties of the Hamiltonian ${\tilde{H}}$.

Existence and Uniqueness of the viscosity solution ${\hat{V}}$

For existence and uniqueness of ${\hat{V}}$, we check our problem against the assumptions required in Proposition 2.

A1
We note that both ${\tilde{f}}$ and ${\tilde{F}}_{\lambda }$ are continuous in $(t,{\tilde{u}})$ for all $x\in \mathbb {X},\, \overrightarrow{\alpha }\in \mathbb {A}$ and thus Lebesgue-Borel measurable in those arguments. For $\phi \in L^{1}([0,\infty );\mathbb {R})$, let us choose $\phi (t) = {\tilde{l}}_{-}e^{-\lambda t}$, where ${\tilde{l}}_{-} = \inf \limits _{(t,x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|$.
A2
We need to establish a growth bound for a.e. $t>0$ and for all $(x,\overrightarrow{\alpha })\in \mathbb {X}\times ~ \mathbb {A},$ ${\tilde{u}}(t) \in {\tilde{U}}(t,x,\overrightarrow{\alpha })$. First, we rewrite the dynamics for easier computation of bounds. From (13), we rewrite the dynamic function ${\tilde{f}}$ as
$$\begin{aligned} {\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}) = A(t,x,\overrightarrow{\alpha })+B(t,x,\overrightarrow{\alpha }){\tilde{u}}(t), \end{aligned}$$
(18)
where
$$\begin{aligned} A(t,x,\overrightarrow{\alpha }) = a(t,x) + b(t,x)\{D(x_{s})b_{{K_o}}(t,x)\}^{\dag }\\ \left[ -\overrightarrow{G}_{s}(\{x^{(l)}_{s}(t)\}; K_{o},\; {r}) + G_{\alpha }({\overrightarrow{\alpha }}; K_{o}) - D(x_{s}){a_{K_{o}}(t,x)}\right] \end{aligned}$$
and
$$\begin{aligned} B(t,x,\overrightarrow{\alpha }) = b(t,x)\{D(x_{s})b_{{K_{o}}}(t,x)\}^{\dag }I(\alpha ). \end{aligned}$$
Going forward, we note that ${\tilde{f}}$ is bounded in all arguments (a.e. in t); thus, we have
$$\begin{aligned} |{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})| \le K_{{\tilde{f}}}(t)(1+|(x,\overrightarrow{\alpha })|). \end{aligned}$$
(19)
Next,
$$\begin{aligned} \begin{array}{ll} |{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}})|&{}\le \sup \limits _{(x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|e^{-\lambda t},\\ &{}\le K_{{\tilde{F}}_{\lambda }}(t)(1+|(x,\overrightarrow{\alpha })|), \end{array} \end{aligned}$$
(20)
where $ K_{{\tilde{F}}_{\lambda }}(t):= \sup \limits _{(x,\overrightarrow{\alpha },{\tilde{u}})}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\tilde{u}})|e^{-\lambda t}$. Thus, we take $c \in L^{1}_{loc}([0,\infty );\mathbb {R}^{+})$ as
$$\begin{aligned} c(t) = K_{{\tilde{f}}}(t)+K_{{\tilde{F}}_{\lambda }}(t). \end{aligned}$$
A3
Define set-valued maps ${\mathcal {F}}(t,x,\overrightarrow{\alpha }):= \{{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\}$ and ${\mathcal {L}}(t,x,\overrightarrow{\alpha })$ $:= \{{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }, {\tilde{u}}): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })\}$. Since the functions ${\tilde{f}}$ and ${\tilde{F}}_{\lambda }$ are continuous for $(x,\overrightarrow{\alpha })\in \mathbb {X}\times \mathbb {A}$ and t a.e., one has that the maps ${\mathcal {F}}$ and ${\mathcal {L}}$ are continuous. To show the closedness of images, consider the sequence $\{(x_{n},\overrightarrow{\alpha }_{n})\}\subset \mathbb {X}\times \mathbb {A}$ that converges to a point $({\bar{x}},\bar{\overrightarrow{\alpha }})\in \mathbb {X}\times \mathbb {A}$. It is clear that
$$\begin{aligned} \begin{array}{c} {\mathcal {F}}(t,x_{n},\overrightarrow{\alpha }_{n})\rightarrow {\mathcal {F}}(t,{\bar{x}},\bar{\overrightarrow{\alpha }}),\\ {\mathcal {L}}(t,x_{n},\overrightarrow{\alpha }_{n})\rightarrow {\mathcal {L}}(t,{\bar{x}},\bar{\overrightarrow{\alpha }}). \end{array} \end{aligned}$$
Convexity of the set
$$\begin{aligned} \{({\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}}); {\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}}) +r): {\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha }), r \ge 0\} \end{aligned}$$
in ${({\tilde{u}}, r)}$ follows from the convexity of ${\tilde{f}}$ and ${\tilde{F}}_{\lambda }$ in ${\tilde{u}}$ and the affine relationship of r.
A4
To satisfy the assumption A4 of Proposition 2, we break this item into finding suitable bounds for ${\tilde{F}}_{\lambda }$ and ${\tilde{f}}$. First, fix t and ${\tilde{u}}\in {\tilde{U}}(t,x,\overrightarrow{\alpha })$. Next, for $(x,\overrightarrow{\alpha }_{1})$ and $(y,\overrightarrow{\alpha }_{2})$, we proceed with the bounds on ${\tilde{F}}_{\lambda }$. With the assumptions on ${\tilde{l}}$, we have $|{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha }_{1},{\tilde{u}})-{\tilde{F}}_{\lambda }(t,y,\overrightarrow{\alpha }_{2},{\tilde{u}})| \le K_{{\tilde{F}}_{\lambda }}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|.$ In order to establish the appropriate bounds on ${\tilde{f}}$, we consider similar bounds on the matrix-valued functions A and B as given in A2. We note that since a and b are Lipschitz and bounded in x for a.e. $t \ge t_{0}$, we can establish Lipschitz bounds for A of the form
$$\begin{aligned} |A(t,x,\overrightarrow{\alpha }_{1})-A(t,y,\overrightarrow{\alpha }_{2})|\le K_{A}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|. \end{aligned}$$
For the function B, we have
$$\begin{aligned} \begin{array}{l} |(B(t,x,\overrightarrow{\alpha }_{1})-B(t,y,\overrightarrow{\alpha }_{2})){\tilde{u}}(t)|\le K_{B}(t){\tilde{u}}_{b}|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|. \end{array} \end{aligned}$$
Here ${\tilde{u}}_{b} = \sup \limits _{{\tilde{u}}\in {\tilde{U}}}|{\tilde{u}}|$ is justified due to a coercivity argument similar to the proof of Theorem 7.4.6 in [8] using point A2 and coercive properties of ${\tilde{F}}$ in ${\tilde{u}}$. Thus, we conclude
$$\begin{aligned} \begin{array}{l} |{\tilde{f}}(t,x,\overrightarrow{\alpha }_{1},{\tilde{u}})-{\tilde{f}}(t,y,\overrightarrow{\alpha }_{2},{\tilde{u}})|\le C_{{\tilde{f}}}(t)|(x,\overrightarrow{\alpha }_{1})-(y,\overrightarrow{\alpha }_{2})|,\\ \end{array} \end{aligned}$$
(21)
where $C_{{\tilde{f}}}(t) = K_{A}(t) + K_{B}(t){\tilde{u}}_{b}$. In other words, A4 is satisfied with
$$\begin{aligned} k(t) = K_{{\tilde{l}}}(t)e^{-\lambda t} + ~C_{{\tilde{f}}}(t). \end{aligned}$$
A5
To show that $k \in {\mathcal {L}}_{loc}$, we note that the suprema of $K_{{\tilde{l}}}(t),\,K_{A}(t),\text { and }K_{B}(t)$ are finite. Thus we can say $k(t) \le 3*\max \{\sup \limits _{t}K_{{\tilde{l}}}(t),\, \sup \limits _{t}K_{A}(t), \sup \limits _{t}K_{B}(t){\tilde{u}}_{b}\}=: K$. For $J \subset [t_{0},\infty )$ and $\sigma \in [t_{0},\infty )$, we have
$$\begin{aligned} \int _{J}k(\tau )d \tau \le K\mu (J), \end{aligned}$$
$$\begin{aligned} \sup \limits _{\{J:\mu (J)\le \sigma \}}\int _{J}k(\tau )d \tau \le \sup \limits _{\{J:\mu (J)\le \sigma \}} K\mu (J). \end{aligned}$$
(22)
Letting $\sigma $ approach zero shows that the RHS of (22) approaches zero and thus $k \in {\mathcal {L}}_{loc}$. Moreover, we get
$$\begin{aligned} \begin{array}{rl} \frac{1}{t} \int ^{t}_{t_0}(c(\tau ) +k(\tau ))d \tau &{}\le K\frac{t-t_{0}}{t} + \frac{1}{t}\int ^{t}_{t_0}c(\tau ) d\tau \\ &{}\le (K + c_{max})(1+\frac{t_{0}}{t}). \end{array} \end{aligned}$$
(23)
As t approaches infinity, we note that the right-hand side of (23) is finite.
A6
We know from A2 that for $t \ge 0$, closed set $\mathbb {X}$, and all $x\in \partial \mathbb {X}$, it follows that $|{\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})| + |{\tilde{F}}_{\lambda }(t,x,\overrightarrow{\alpha },{\tilde{u}})| \le c(t)(1+{\tilde{x}}_{m}),$ where ${\tilde{x}}_{m} = \sup \{|(x,\overrightarrow{\alpha })|: (x,\overrightarrow{\alpha })\in \partial \mathbb {X}\times \partial \mathbb {A}\}$. Taking the supremum over ${\tilde{u}}\in {\tilde{U}}$ yields the assumption A6 with $q(t) = c(t)(1+x_{m})$.
A7
By the structure of ${\tilde{F}}_{\lambda }$, the assumption A7 readily follows.
A8
The idea of A8 is to show that for a point $({\bar{x}},\overrightarrow{\alpha })\in \partial \mathbb {X}\times \partial \mathbb {A}$, there exists a trajectory that lies in the interior of $\mathbb {X}\times \mathbb {A}$ such that $({\bar{x}},\overrightarrow{\alpha })$ is reached. Prior to problem (14)–(15), we assumed that the dynamics ${\dot{x}} = a(t,x)+b(t,x)u$ are controllable (which satisfies A8 for (5)). Now we need to show that the controllability property holds also for ${\dot{x}} ={\tilde{f}}(t,x,\overrightarrow{\alpha },{\tilde{u}})$. Let us note that $\varPhi (t,x,\overrightarrow{\alpha },\cdot )$ is a closed map (in the fourth variable). Since $\varPhi $ is a closed map in ${\tilde{u}}$ and the original dynamics are controllable, there exists a control ${\tilde{u}}$ such that one can reach some final position $x_{f}\in \partial \mathbb {X}$ with a trajectory in the interior of $\mathbb {X}$ whose dynamics are governed by ${\tilde{f}}$.

Having established the assumptions A1-A8, it suffices to show that the value function ${\hat{V}}(t,x,\overrightarrow{\alpha }) =\int ^{\infty }_{t}{\tilde{l}}(s,x,\overrightarrow{\alpha },{\hat{u}})$ $e^{-\lambda s}ds$ has the following properties: (i) ${\hat{V}}$ is lower semicontinuous, (ii) Dom ${\hat{V}}(t,\cdot ,\cdot ) \ne \emptyset $ for all large $t>t_{0}$, and (iii)

$$\begin{aligned} \lim \limits _{t\rightarrow \infty }\sup \limits _{(y,\beta )\in \text {Dom }{\hat{V}}(t,\cdot ,\cdot )}|{\hat{V}}(t,y,\beta )| = 0. \end{aligned}$$

We know that ${\tilde{l}}$ is at least lower semicontinuous in $t,\; x \; \text { and }\overrightarrow{\alpha }$ and thus (i) is satisfied (e.g., see Remark 4.1). Similarly, since ${\tilde{l}}$ is bounded and defined in all necessary arguments, we have (ii). For (iii),

$$\begin{aligned} \lim \limits _{t\rightarrow \infty }\sup \limits _{(y,\beta )\in \text {Dom } {\hat{V}}(t,\cdot ,\cdot )}|{\hat{V}}(t,y,\beta )|\le \lim \limits _{t\rightarrow \infty }\int ^{\infty }_{t}{\tilde{l}}_{+}e^{-\lambda s}ds,\\ \end{aligned}$$

where ${\tilde{l}}_{+} = \sup \limits _{(t,x,\overrightarrow{\alpha })}|{\tilde{l}}(t,x,\overrightarrow{\alpha },{\hat{u}})|$. It is clear that the RHS converges to zero as $t\rightarrow \infty $ and thus we establish (iii). From here, we apply Proposition 2 with $(1) \implies (2)$ and have the value function ${\hat{V}}$ as the unique viscosity solution.

Existence and Uniqueness of ${\hat{u}}$

Existence and uniqueness of ${\hat{u}}$ depends on the structure of ${\tilde{H}}$ and ${\tilde{U}}(t,x,\overrightarrow{\alpha })$. First we show continuity and strict convexity of ${\tilde{H}}$ in ${\tilde{u}}$. Both of these conditions are connected to the continuity and convexity of $\varPhi $ in ${\tilde{u}}$. We note that function $\varPhi $ is affine in ${\tilde{u}}$ for all $(t,\,x,\,\overrightarrow{\alpha })$ and thus continuous and convex in ${\tilde{u}}$. By definitions of ${\tilde{F}}_{\lambda }$ and ${\tilde{f}}$, we know that ${\tilde{F}}_{\lambda }$ is strictly convex and ${\tilde{f}}$ convex in ${\tilde{u}}$. Both functions are also continuous in ${\tilde{u}}$. As a result, ${\tilde{H}}$ is continuous and strictly convex in ${\tilde{u}}$. When $K_{o} =1$, compactness and convexity of ${\tilde{U}}(t,x,\overrightarrow{\alpha })$ can be established (see ([23], Lemma 1)) and ${\tilde{H}}$ admits a unique minimizer ${\hat{u}}$. This concludes the proof. $\square $

Remark 4.3

Note that the classical necessary and sufficient conditions for existence of solutions to (5) without constraints can be expressed through PP ([15, pp. 102–103]) and HJB ([15, pp. 165]), respectively. We stress that the presented existence and uniqueness results are applied to problem (14)–(15) rather than problem (5).

5 Solving the Original Problem

5.1 Convergence to Original Infinite-Horizon Problem

Up to this point, we have analyzed (5). Now we show how to exploit (5) to solve the infinite-horizon version of (3), that is, we set $t_f = \infty $ and $E(x(t_f),x_f) \equiv 0$ in (3). Recall that it is possible that non-unique trajectory/control pairs $\{({\tilde{x}},{\tilde{u}})\}$ exist that solve infinite-horizon (3). Given $(x_{\lambda },u_{\lambda })$ as a solution of (5) as guaranteed by Theorem 4.1, we establish the conditions on $(x_{\lambda },u_{\lambda })$ to be a sufficient approximation for a solution of infinite-horizon (3).

Proposition 1

Consider the set of cost functions $\{J_{\infty }: \mathbb {X} \times U(t) \times [0,\infty ) \rightarrow \mathbb {R}\}$, where $J_{\infty }(x,u,\lambda ) = \int ^{\infty }_{t_{0}} l(s,x(s),u(s))$ $e^{-\lambda s}ds$ with dynamics and constraints from (5) and $J_{\infty }(x,u,0) = \int ^{\infty }_{t_{0}} l(s,x(s),u(s))ds$ (i.e., the cost function for the infinite-horizon variant of (3)). Let $(x_{\lambda },u_{\lambda })$ be the associated trajectory/control pair that minimizes $J_{\infty }(x,u,\lambda )$ and l(t, x, u) satisfy the conditions posed in Sect. 3. Then $J_{\infty }(x,u,\lambda )\xrightarrow {\varGamma }J_{\infty }(x,u,0)$ and there exists a trajectory/control pair $(x^{*}, u^{*})$ such that $(x_{\lambda }, u_{\lambda })\rightarrow (x^{*}, u^{*})$ uniformly and $(x^{*},u^{*})$ solves the infinite-horizon problem (3).

Proof

To show $\varGamma -$convergence of $J(x,u,\lambda )$, we consider the standard arguments based on Definition 2.2. Since $e^{-\lambda t} \le 1$ for $\lambda \in [0,\infty )$, we know

$$\begin{aligned} l(t,x(t),u(t))e^{-\lambda t} \le l(t,x(t),u(t)) \end{aligned}$$

and thus

$$\begin{aligned} \int ^{\infty }_{t_0}l(s,x(s),u(s))e^{-\lambda s} ds \le \int ^{\infty }_{t_0}l(s,x(s),u(s))ds. \end{aligned}$$

From here, it is clear that

$$\begin{aligned} J(x,u,0)\ge J^{sup}(x,u):=\limsup \limits _{\epsilon \rightarrow 0}\;J(x_{1,\epsilon }, u_{1,\epsilon }, \epsilon ) \end{aligned}$$

(24)

for some sequence $(x_{1,\epsilon }(t), u_{1,\epsilon }(t))\rightarrow (x(t),u(t))$ uniformly. To show the second inequality, we give the following argument. Given $\lambda _{n}\rightarrow 0$, (x, u) such that $J(x,u,0)< \infty $, and $(x_{n}, u_{n})\rightarrow (x,u)$ uniformly, then since l is nonnegative and continuous, we have for $t_{0}< T < \infty $

$$\begin{aligned} \int ^{\infty }_{t_{0}}e^{-\lambda _{n}t}l(t,x_{n}(t),u_{n}(t))dt \ge \int ^{T}_{t_{0}}e^{-\lambda _{n}t}l(t,x_{n}(t),u_{n}(t))dt. \end{aligned}$$

The uniform convergence of $(x_{n}, u_{n})$ to (x, u) on $[t_{0}, T]$, $\lambda _{n}\rightarrow 0$ and the continuity of l, yield

$$\begin{aligned} \int ^{T}_{t_{0}}e^{-\lambda _{n}t}l(t,x_{n}(t),u_{n}(t))dt \xrightarrow {n\rightarrow \infty } \int ^{T}_{t_{0}}l(t,x(t),u(t))dt \end{aligned}$$

which means that

$$\begin{aligned} \liminf _{n}\int ^{\infty }_{t_{0}}e^{-\lambda _{n}t}l(t,x_{n}(t),u_{n}(t))dt \ge \int ^{T}_{t_{0}}l(t,x(t),u(t))dt. \end{aligned}$$

Then with $T\rightarrow \infty $, one has

$$\begin{aligned} J^{inf}(x,u):= \liminf _{n}\int ^{\infty }_{t_{0}}e^{-\lambda _{n}t}l(t,x_{n}(t),u_{n}(t))dt \ge J(x,u,0). \end{aligned}$$

Third, we show that there exists a compact set C with $ \inf \limits _{(x, u)} J(x, u,\epsilon ) = \inf \limits _{(x, u)\in C} J(x,u,\epsilon )$ as $\epsilon > 0$. We note that, by the properties of the function l(t, x(t), u(t)) and the set $\mathbb {X}\times U(t)$, $J(x,u,\epsilon )$ is a bounded function. Thus for each $\epsilon > 0$, we can always find such a set C.

Owing to the previous part of the proof and since ${\dot{x}} = a(t,x) + b(t,x)u$ is controllable, there exists a pair $(x^{*}, u^{*})$ such that $\int ^{\infty }_{t_0}l(s, x^{*}(s), u^{*}(s)) ds$ is finite and minimal. By Theorem 4.1, for $\lambda > 0$, there exists an optimal trajectory/control pair $(x_{\lambda }, u_{\lambda })$ that solves the corresponding optimal control problem (5) with the corresponding $\lambda $ value.

Since $J(x,u,\lambda )\xrightarrow {\varGamma }J(x,u,0)$ and each pair $(x_{\lambda }, u_{\lambda })$ minimizes $J(x,u,\lambda )$ for a given $\lambda $, we reach $(x_{\lambda }, u_{\lambda })\rightarrow (x^{*}, u^{*})$ [11, Corollary 7.24]. $\square $

Remark 5.1

One can interpret Proposition 1 as the result of when $\lambda \rightarrow 0$ in the sense that we are determining the efficacy of $(x_{\lambda },u_{\lambda })$ when the cost function lacks the exponential decay in time (i.e., $\lambda = 0$).

Remark 5.2

Our approach is similar to the vanishing discount method of ergodic control theory. In [3] and [4], one requires constructing a Lyapunov function to show precompactness of the family of cost functions $\mathbb {J} = \{J_{\infty }(\cdot ,\cdot ,\lambda )\}$. In short, one must rely on the dynamics of the problem (typically a stochastic differential equation [9]) to obtain convergence for $\mathbb {J}$ and thus $\{(x_{\lambda }, u_{\lambda })\}$. In our results, once Theorem 4.1 is applied, convergence only depends on the properties of $\mathbb {J}$. With respect to state constrained problems [17], additional assumptions are required on the constraints. In this paper, state constraints are handled via the Valentine transformation and the results of Theorem 4.1, rather than with $\varGamma $-convergence. It should be noted that $\varGamma $-convergence can be used independently of any control framework, i.e., the behavior of a family functionals and their respective minimizers.

5.2 Approximation of Original Finite-Time Problem

For many practical problems, finite-horizon formulations are of interest, that is, $t_f$ is finite and $E(\cdot ) \not \equiv 0$ in (3). Essentially, we wish to form a relation between the infinite-time horizon problem (5) and the finite-time version of problem (3). Consider the new running cost functional $l'(t,x,u,\lambda )$ defined as

$$\begin{aligned} l'(t, x, u,\lambda ) =\;\Bigg \{ \begin{array}{rcl} l(t,x,u)e^{-\lambda t} &{}\text {if} &{} t\in [t_{0}, t_{f}],\\ \gamma (t,u,\lambda )E(x(t_{f}), x_{f})e^{-\lambda t} &{} \text {if} &{} t > t_{f},\\ \end{array} \end{aligned}$$

(25)

and thus

$$\begin{aligned} \begin{array}{rcl} &{}&{}\int ^{\infty }_{t_{0}}l'(s,x(s),u(s),\lambda )ds\\ &{}&{} \quad =\int ^{t_f}_{t_{0}}l(s,x(s),u(s))e^{-\lambda s}ds +\int ^{\infty }_{t_f}\gamma (s,u,\lambda )E(x(t_{f}), x_{f})e^{-\lambda s}ds. \end{array} \end{aligned}$$

(26)

We choose $\gamma (t,u,\lambda )$ such that Theorem 4.1 holds and Proposition 1 is satisfied with the necessary modifications (e.g., restriction of one’s interest to the interval $[t_0,t_f]$). For example, we use the expression $\gamma (u,\lambda ) = \lambda e^{\lambda t_f}(u^{T}R_{s}u+1)$, where $R_s$ is a positive definite matrix. In general, the function $\gamma $ is not unique, so its choice is to personal preference. In order to assess the proposed approximation numerically, we consider the function

$$\begin{aligned} E_{D}(\lambda , t_{f}):= \frac{|J_{\infty }(t_{0},{\hat{x}},{\hat{u}},\lambda ) - J(t_{0},t_{f}, {\hat{x}}_{f},{\hat{x}},{\hat{u}})|}{J(t_{0},t_{f}, {\hat{x}}_{f},{\hat{x}},{\hat{u}})} \end{aligned}$$

(27)

with $({\hat{x}}, {\hat{u}})$ as the trajectory/control pair which minimizes $J_{\infty }$. The function $E_{D}$ acts as a (relative) error function between the given cost function J and the infinite-horizon approximation $J_{\infty }$.

6 Examples

Suppose we have a bounded and convex set $\mathbb {X}$ as defined before. We want an autonomous agent to reach a final state $x_f \in \mathbb {X}$ starting from an initial state $x_0 \in \mathbb {X}$ while minimizing some cost and staying within $\mathbb {X}$. In addition, $\mathbb {X}$ may contain obstacles that the autonomous agent must avoid and there may be constraints imposed on the motion (e.g., related to agent path, velocity, or acceleration) and performance of the autonomous agent (e.g., time required to traverse $\mathbb {X},$ fuel usage). This section deals with numerical results regarding computation of solutions for such optimal control problems using DIDO. We introduce three path planning examples corresponding to constrained LQR, full nonlinear UAV dynamics, and dynamics related to the AUV. In addition we include finite-time applications and calculations of the suitable error function $E_{D}$. For computation purposes, we use MATLAB R2018b on a PC using Windows 10, equipped with a 2.10GHz Intel(R) Xeon(R) Silver 4110 CPU.

6.1 Optimal Path Planning for a LQR Type Problem with Constraints

Let us consider the following LQR problem

$$\begin{aligned} \begin{array}{rl} \inf \limits _{(x,u)}\;\; J(t_0, t_{f}, x,u)= &{} \int ^{t_{f}}_{t_{0}}x(s)^{T}Qx(s) + u(s)^{T}Ru(s)ds\\ \end{array} \end{aligned}$$

(28)

subject to

$$\begin{aligned} \begin{array}{rcl} {\dot{x}}(t)&{}=&{}Ax(t) + Bu(t),\\ (x(t_{0}), x(t_{f})) &{}=&{} (x_{0}, x_{f}). \end{array} \end{aligned}$$

(29)

We define $J_{\infty }(t_{0}, x, u, \lambda )\!:=\! \int ^{\infty }_{t_{0}}(x(s)^{T}Qx(s)\!+\!u(s)^{T}Ru(s))e^{-\lambda s}ds$ and use this in the definition of $E_{D}$.

We select the following values: $A = [0\; 1; 2\; -1]$, $B = [0; 1]$, $Q = [1\;\; 0; 0\;\; 0.25]$, $R = 0.5$, $x_{0} = [4;4]$, and $x_{f} = [0;0]$. We also include constraints $U(t) = \{u(t) \in \mathbb {R}: -20 \le u(t) \le 20\}$, $h_{1}(x(t)) = (x_{1}(t) - 2)^{2} + (x_{2}(t)-1)^{2}$, $h_{2}(x(t)) =(\frac{x_{1}(t) - 7}{2})^{2} + (x_{2}(t)-1)^{2}$, $h^{L} = [1.1; 1.1]$.

We notice from Fig. 1.a that the given trajectory satisfies the imposed state constraints and, from Fig. 1.b, we see that the control signal is confined to $[-20, 20]$. The somewhat unusual path in Fig. 1 stems from the dynamics of the problem given by (28)–(29). We include values comparing infinite- and finite-horizon LQR with obstacles. For the finite-time problem, let us choose $t_f = 6$s as the transient duration is about 6.9s. In addition, we consider $R_s = 0.01$ in our choice for $\gamma $. For infinite-time problem, we calculate $J_{\infty }$ directly with $l'(t,x(t),u(t),\lambda )$ as defined in (26). Resulting calculations are found in Table 1. As demonstrated, the smaller the value of $\lambda ,$ the more accurate approximation is obtained, which is in accordance with Sect. 5.2.

Table 1 Relations between $\lambda $, $E_{D}(\lambda ,t_{f})$, and computational runtimes

Full size table

6.2 Optimal Path Planning for a Fully Nonlinear Model of an AUV in 3D with Obstacles

With respect to the more common notation of rigid-body models found in literature, this example uses $\omega $ for the state and $\tau $ for control u. Consider the problem

$$\begin{aligned} \begin{array}{rl} \inf \limits _{(\omega ,\tau )}\;\; J(t_0, t_{f}, \omega ,\tau )= &{} \int ^{t_{f}}_{t_{0}}\omega (s)^{T}Q\omega (s) + \tau (s)^{T}R\tau (s)ds\\ \end{array} \end{aligned}$$

(30)

subject to

$$\begin{aligned} \begin{array}{rcl} \dot{\omega _{s}}&{}=&{}m^{-1}(\hat{\omega _{s}}(\omega _{r}) - D_{s}\omega _{s}+ \tau _{s}),\\ \dot{\omega _{r}}&{}=&{} I^{-1}_{CG}(\widehat{(I_{CG}\omega _{r})}(\omega _{r}) - D_{r}\omega _{r}+\tau _{r}),\\ (\omega (t_{0}), \omega (t_{f})) &{}=&{} (\omega _{0}, \omega _{f}), \end{array} \end{aligned}$$

(31)

with $\omega = [\omega _{s}; \omega _{r}]$ and $\tau = [\tau _{s}; \tau _{r}]$ where $\omega _{s}=[x; y; z] \in \mathbb {R}^{3}$ represents the translation states, $\omega _{r}=[\rho ; \theta ; \psi ] \in [-\pi , \pi ]^{3}$ represents the rotation states (measured in radians) and controls $\tau _{s},\; \tau _{r}\in \mathbb {R}^{3}$. We define m as the mass of the AUV, $3\times 3$ matrices $D_{s},\; D_{r}$ are damping matrices, and $I_{CG}$ is the inertia matrix which is symmetric and positive definite of the form

$$\begin{aligned} I_{CG} = \begin{bmatrix} I_{x} &{} -I_{xy} &{} -I_{xz}\\ -I_{yx}&{} I_{y} &{} -I_{yz}\\ -I_{zx} &{} -I_{zy} &{} I_{z} \end{bmatrix}. \end{aligned}$$

We also include constraints

$$\begin{aligned} \begin{array}{rcl} (\frac{x-8.3}{1.7})^{2} + (\frac{y-8.3}{1.7})^{2} + (\frac{z-8}{1.7})^{2}&{}\ge &{} 1,\\ (\frac{x-1.7}{1.7})^{2} + (\frac{y-1.7}{1.7})^{2} + (\frac{z-2}{1.7})^{2}&{}\ge &{} 1,\\ (\frac{x-3}{2.7})^{2} + (\frac{y-6.5}{2.7})^{2} + (\frac{z-5}{2.7})^{2}&{}\ge &{} 1,\\ (\frac{x-7.3}{2.7})^{2} + (\frac{y-2.7}{2.7})^{2} + (\frac{z-5}{2.7})^{2}&{}\ge &{} 1.\\ \end{array} \end{aligned}$$

(32)

We define constraints (32) in the form $I^{-1}(h_{L})h(x_{s}) \ge I^{-1}(h_{L})h_{L}$ for ease of numerical computation and similarly do so for the remaining example. Furthermore, $h_{L} = [1.7^{2}; 1.7^{2}; 2.7^{2}; 2.7^{2}]$ is understood with $d_{i} = 0.7$ (0.5 accounts for the radius of AUV, 0.2 gives additional buffing from obstacles; all values are given in meters). Define spaces $\mathbb {X} = [0,10]^{3}\times [-\pi , \pi ]^3$ and $\mathbb {U}(t) = [-50, 50]^{6}$. We set values $Q = \text { diag}\{{\textbf {1}}_{3\times 1}; {{\textbf {50}}}_{3\times 1}\}$, $R = \text { diag}\{1;10;10;1; 5; 10\}$, $m = 10,$ $I_{x} = I_{y} = 3.45$, $I_{xz} = I_{zx} = -1.28\times 10^{-15}$, $I_{z} = 2.2$, $I_{xy} = I_{yx} = I_{yz} = I_{zy} = 0$, $D_{s} = \text { diag}\{0.1; 5; 5\}$, $D_{r} = \text { diag}\{5; 5; 0.1\}$, $\omega _{0} = [{{\textbf {10}}}_{3\times 1};{{\textbf {0}}}_{3\times 1}]$, and $\omega _{f}= {{\textbf {0}}}_{6\times 1}$. The values for matrix $I_{CG}$ are taken from the specifications in [10].

For finite-time problem approximation, we choose $t_{f} = 10$ and also take into consideration the number of algorithm nodes ($a_{n}$) used for calculation with DIDO. In DIDO, $a_{n}$ relates to the discretization of the time interval $[t_{0},\, t_{f}]$ for computation of cost function $J(t_0, t_{f}, \omega ,\tau )$. Since model (30)–(31) has $E(x(t_f),x_{f}) = 0$, the choice of function $\gamma $ is unnecessary. However, for completeness, we take $\gamma (t, u,\lambda )= \lambda e^{\lambda t_f}(u^{T}R_{s}u+1)$ with matrix $R_{s} = 0.02I_{6}.$

Simulation results for the path planing problem in 3D with 4 obstacles for an AUV are shown in Figs. 2 and 3. As we observe in Fig. 2, as $\lambda $ approaches 0, both translations and rotations converge to the resulting finite-time behavior. This behavior is also reflected in Table 2 with respect to $E_{D}$ for both $a_{n} = 50$ and $a_{n} = 20$.

6.3 Optimal Path Planning for a Fully Nonlinear Model of an UAV in 3D with Obstacles

Previously, we considered examples where uniqueness of control could be established, i.e., $K_o = 1$. Now, we look at example in which that is not guaranteed. Consider the quadrotor example derived from [14]:

$$\begin{aligned} \begin{array}{rl} \inf \limits _{(\omega ,\tau )}\;\; J(t_0, t_{f}, \omega ,\tau )= &{}\omega _{sf}^{T}Q_{f}\omega _{sf} + \int ^{t_{f}}_{t_{0}}\omega (s)^{T}Q\omega (s) + \tau (s)^{T}R\tau (s)ds\\ \end{array} \end{aligned}$$

(33)

subject to

$$\begin{aligned} \begin{array}{rcl} \dot{\omega _{s}}&{}=&{} \omega _{v},\\ \dot{\omega _{v}}&{}=&{} ge_{3} - m^{-1}M_{\omega _{r}}\tau ,\\ \dot{\omega _{r}}&{}=&{} S(\omega _{rv})\omega _{r},\\ \dot{\omega _{rv}}&{}=&{} I^{-1}_{CG}(\widehat{(I_{CG}\omega _{rv})}(\omega _{rv}) +\varLambda \tau ),\\ (\omega (t_{0}), \omega (t_{f})) &{}=&{} (\omega _{0}, \omega _{f}). \end{array} \end{aligned}$$

(34)

Here $\omega = [\omega _{s}; \omega _{v}; \omega _{r}; \omega _{rv}]$ with $\omega _{s}$ as defined in the previous example, $\omega _{v}\in \mathbb {R}^{3}$ as the translational velocities, $\omega _{r}\in [-1,1]^{9}$ as the rotational states, $\omega _{rv} = [{\dot{\rho }}; {\dot{\theta }}; {\dot{\psi }}]\in \mathbb {R}^{3}$ as angular velocities and control $\tau \in \mathbb {R}^{4}$. Furthermore, g is acceleration due to gravity (in $m/s^2$), $M_{\omega _{r}}$ is a $3\times 4$ matrix with columns $M_{\omega _{r}}(:,i) = [\omega _{r}]_{3}e_{3}$ for $i \in \{1,2,3,4\}$, $e_{3} = [0;0;1]$ and $\varLambda $ is $3\times 4$ matrix of the form

$$\begin{aligned} \varLambda = \begin{bmatrix} 0 &{} -d &{} 0 &{} d\\ d &{} 0 &{} -d &{} 0\\ -c &{} c &{} -c &{} c \end{bmatrix}, \end{aligned}$$

with d as the distance from the center of mass to the center of each rotor (in $xy-$plane) and nonzero constant c.

Table 2 Relations between $\lambda $, $E_{D}(\lambda ,t_{f})$, and computational runtimes with $a_{n} = 50$ and $a_{n}= 20$

Full size table

The given constraints for this example are

$$\begin{aligned} \begin{array}{rcl} (\frac{x-3}{1.7})^{2} + (\frac{y-9}{1.7})^{2} + (\frac{z-7.5}{1.7})^{2}&{}\ge &{} 1,\\ (\frac{x-4}{2.7})^{2} + (\frac{y-4}{2.7})^{2} + (\frac{z-5}{2.7})^{2}&{}\ge &{} 1,\\ (\frac{x-9}{1.7})^{2} + (\frac{y-3}{1.7})^{2} + (\frac{z-3}{1.7})^{2}&{}\ge &{} 1.\\ \end{array} \end{aligned}$$

(35)

We take some time to explain the structure of $\omega _{r}$ dynamics. In usual quadrotor models, the rotation states - pitch, roll, and yaw - are represented in a standard rotation matrix $R(\rho , \theta , \psi )$. Unfortunately, the modeling dynamics for $(\rho , \theta , \psi )$ in this form is quite complicated. To work around this, we instead work with a quaternion variant of this rotation matrix. This allows the rotation dynamics to be written as linear system in the new rotation states - see $\omega _{r}$ dynamics. To recover $\rho $, $\theta $, and $\psi $, we can use a suitable transformation that relates to $\omega _{r}$. For more information, see ([12], Sections 2.2.3–2.2.4).

In addition, we relate the dimension of the control for the quadrotor to that of the AUV. In standard literature, systems are modeled with six control inputs. We note that in the previous example, $\tau $ has six components whereas quadrotor has four inputs. To coincide with the norm, we notice that we can define a new control $\tau _{new} = [\tau _{s}; \tau _{r}] \in \mathbb {R}^{6}$, where $\tau _{s}=M_{b}\tau $ and $\tau _{r} = \varLambda \tau $.

Define spaces $\mathbb {X} = [0,10]^{3}\times [-20, 20]^3\times [-1, 1]^9\times [-20\pi , 20\pi ]^3$ and $\mathbb {U}(t) = [-50, 50]^{4}$. We set values $m = 4.34 (kg)$, $g = 9.8 (m/s^{2})$, $d = 0.315 (m)$, $c = 8.004\times 10^{-4} (m)$, $Q_{f} = I_{3}$, $Q = \text { diag}\{{{\textbf {0}}.{\textbf {1}}}_{3\times 1}; {{\textbf {1}}}_{3\times 1};$ ${{\textbf {0}}}_{9\times 1}; {{\textbf {0}}.{\textbf {5}}}_{3\times 1}\}$, $R = I({\textbf {0}}.{\textbf {5}}_{4\times 1})$, $I_{x} = 2\times 10^{-3}$, $I_{y} = 8.45\times 10^{-2}$, $I_{z} = 0.1377$, $I_{xz} = I_{zx} = I_{xy} = I_{yx} = I_{yz} = I_{zy} = 0$, $\omega _{s0} = [0; 10; 10]$, $\omega _{sf}= [10;0;0]$, $\omega _{v0} = \omega _{vf}= {\textbf {0}}_{3\times 1}$, $[\omega _{r0}]_{3} = [\omega _{rf}]_{3} = I_{3}$, and $\omega _{rv0} = \omega _{rvf} ={\textbf {0}}_{3\times 1}$.

For finite-time problem approximation, we choose $t_{f} = 4$. Unlike the previous examples, here we have a nonzero endpoint function. Thus, we choose matrix $R_{s} = 0.2I_{4}$ and notice behavior (Fig. 4 and Table 3) similar to the previous example. We note that model (33)–(35) has $K_{o} = 2$ (see Sect. 4.1). As thus, uniqueness of ${\tilde{u}}$ cannot be guaranteed for any of the $\lambda -$approximations of (33)–(35). For more details, see Remark 4.2.

Table 3 Relations between $\lambda $, $E_{D}(\lambda ,t_{f})$, and computational runtimes with $a_{n}= 50$

Full size table

7 Conclusions and Future Work

We have extended uniqueness and existence results from [23] for a class of optimal control problem with constraints employing results for the HJB equation and Pontryagin’s Principle as well as the use of $\varGamma $-convergence.

The principal research avenue is to investigate stability and robustness of the solutions obtained via the proposed framework. In other words, we do not want to a priori assume that the trajectories live in a bounded set, but the trajectory boundedness for initial conditions of interest ought to be obtained via the proposed optimization. Also, we plan to extend current results to a coverage problem for autonomous agents with sensors such as cameras, sonars or lidars. We may also consider dynamic obstacles (e.g., agents that intersect with floating debris or local wildlife).

Data Availability

We do not analyze or generate datasets owing to the nature of this work. The code could be shared upon a reasonable request.

References

Bardi, M., Dolcetta, I.C.: Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkhäuser, Basel (2008)
Google Scholar
Basco, V., Frankowska, H.: Hamilton–Jacobi–Bellman equations with time-measurable data and infinite horizon. Nonlinear Differ. Equ. Appl. 26, 7 (2019)
Article MathSciNet Google Scholar
Borkar, V., Budhiraja, A.: Ergodic control for constrained diffusions: characterization using HJB equations. SIAM J. Control. Optim. 43(4), 1467–1492 (2004)
Article MathSciNet Google Scholar
Borkar, V., Budhiraja, A.: HJB equations for ergodic control problems for constrained diffusions in polyhedral domains. In: IEEE Conference on Decision and Control, pp. 6668–6672 (2005)
Bressan, A., Piccoli, B.: Introduction to the Mathematical Theory of Control. American Institute of Mathematical Sciences, New York (2007)
Google Scholar
Busoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I.: Reinforcement learning for control: performance, stability, and deep approximators. Annu. Rev. Control. 46, 8–28 (2018)
Article MathSciNet Google Scholar
Camacho, E.F., Bordons, C.: Model Predictive Control. Springer, London (2007)
Book Google Scholar
Cannarsa, P., Sinestrari, C.: Semiconcave Functions, Hamilton–Jacobi Equations, and Optimal Control. Birkhäuser, Basel (2004)
Book Google Scholar
Campillo, F.: Optimal ergodic control of nonlinear stochastic systems. In: Kree, P., Wedig, W. (eds.) Probabilistic Methods in Applied Physics, vol. 451, pp. 239–269. Springer Berlin Heidelberg, Berlin (1995)
Chapter Google Scholar
Chen, J., Sun, C., Zhang, A.: Autonomous navigation for adaptive unmanned underwater vehicles using fiducial markers. In: IEEE International Conference on Robotics and Automation, pp. 9298–9304 (2021)
Dal Maso, G.: An Introduction to $\Gamma $-Convergence. Birkhäuser, Basel (1993)
Book Google Scholar
Fossen, T.I.: Handbook of Marine Craft Hydrodynamics and Motion Control. Wiley, New York (2011)
Book Google Scholar
Gorges, D.: Relations between model predictive control and reinforcement learning. IFAC-PapersOnLine 50(1), 4920–4928 (2017)
Article Google Scholar
Lee, T., Leok, M., McClamroch, N.H.: Geometric tracking control of a quadrotor UAV on $SE(3)$. In: IEEE Conference on Decision and Control, pp. 5420–5425 (2010)
Liberzon, D.: Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton University, Princeton (2012)
Book Google Scholar
Marsh, H. C., Karpenko, M., Gong Q.: A pseudospectral approach to high index DAE optimal control problems. arXiv:1811.12582 (2018)
Picarelli, A.: On Some Stochastic Control Problems with State Constraints. Ecole polytechnique X (2015)
Redfern, D.A., Goh, C.J.: Feedback control of state constrained optimal control problems. In: Doležal, J., Fidler, J. (eds.) System Modelling and Optimization. IFIP—The International Federation for Information Processing, pp. 442–449. Springer, Boston (1996)
Google Scholar
Ross, I.M.: A Primer on Pontryagin’s Principle in Optimal Control. Collegiate Publishers, London (2015)
Google Scholar
Ross, I. M.: Enhancement to the DIDO optimal control toolbox. arXiv:2004.13112 (2020)
Ross, I.M., Gong, Q., Fahroo, F., Kang, W.: Practical stabilization through real-time optimal control. American Control Conference, pp. 304–309 (2006)
Valentine, F.A.: The problem of Lagrange with differential inequalities as added side conditions. In: Giorgi, G., Kjeldsen, T. (eds.) Traces and Emergence of Nonlinear Programming, pp. 313–357. Birkhäuser, Basel (2014)
Chapter Google Scholar
Weston, J., Tolić, D., Palunko, I.: Mixed use of Pontryagin’s principle and the Hamilton-Jacobi-Bellman equation in infinite- and finite-horizon constrained optimal control. In: Petrović, I., Menegatti, E., Marković, I. (eds.) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol. 577, pp. 167–185 (2023)

Download references

Acknowledgements

This work is supported by SeaClear, European Union’s Horizon 2020 research and innovation programme under grant agreement No. 871295, and SeaClear2.0, European Union’s Horizon Europe innovation action programme under grant agreement No. 101093822. The authors also thank the reviewers for their insights and suggestions for the improvement of this paper.

Author information

Authors and Affiliations

LARIAT-UNIDU – Laboratory of intelligent and autonomous systems at University of Dubrovnik, Dubrovnik, Croatia
Jerome Weston, Domagoj Tolić & Ivana Palunko
University of Dubrovnik, Ćira Carića 4, 20000, Dubrovnik, Croatia
Jerome Weston & Ivana Palunko
RIT Croatia, Don Frana Bulića 6, 20000, Dubrovnik, Croatia
Domagoj Tolić

Authors

Jerome Weston
View author publications
You can also search for this author in PubMed Google Scholar
Domagoj Tolić
View author publications
You can also search for this author in PubMed Google Scholar
Ivana Palunko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Domagoj Tolić.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Fausto Gozzi.

Appendix

In order to make this article as self-contained as possible and for the reader’s convenience, we establish some additional notation needed for this section. Let $B_{r}(x)$ be the closed ball in $\mathbb {R}^{k}$ of radius $r > 0$ centered at $x\in \mathbb {R}^{k}$. For a nonempty set $C \subset \mathbb {R}^{k}$, we denote its convex hull by co(C).The negative polar cone of C is defined as $C^{-}:= \{p \in \mathbb {R}^{k}: \langle p,c\rangle \le 0 \text { for all } c\in C\}$. Denote the set distance from $x\in \mathbb {R}^{k}$ to C by $d_{C}(x):= \inf \{|x-y|:\; y\in C\}$.

For a nonempty set $C \subset \mathbb {R}^{k}$, collection of nonempty subsets $\{C_{w}\}_{w\in C_{2}}$, and $w_{0}\in C_{2}$, let

$$\begin{aligned} \text {Lim}\sup \limits _{w\rightarrow w_{0}}C_{w} = \{v\in \mathbb {R}^{k}:\; \liminf \limits _{w\rightarrow w_{0}}\;d_{C_{w}}(v)=0\} \end{aligned}$$

and

$$\begin{aligned} \text {Lim} \inf \limits _{w\rightarrow w_{0}}C_{w} = \{v\in \mathbb {R}^{k}:\; \limsup \limits _{w\rightarrow w_{0}}\;d_{C_{w}}(v)=0\}. \end{aligned}$$

For $x \in {\overline{C}} $ define $T_{C}(x):= \{v\in \mathbb {R}^{k}: \liminf \limits _{h\rightarrow 0^{+}}\;d_{A}(x+hv)/h = 0 \}$ and $N_{C}(x):= \text {Lim} \sup \limits _{y\rightarrow C^{x}}T_{C}(y)^{-}$. Finally, denote the epigraph of a function $\varphi $ as epi $\varphi $.

Below we state the necessary assumptions and proposition required for the proof of Theorem 4.1.

A1
For all $x \in \mathbb {R}^{n}$, the mappings ${\tilde{f}}(\cdot , x, \cdot )$ and ${\tilde{F}}(\cdot , x, \cdot )$ are Lebesgue-Borel measurable and there exists $\phi \in L^{1}([0,\infty );\mathbb {R})$ such that ${\tilde{F}}(t,x,u)\ge \phi (t)$ for a.e. $t \ge 0$ and all $(x,u) \in \mathbb {R}^{n}\times \mathbb {R}^{m}$.
A2
There exists a function $c \in L^{1}_{loc}([0,\infty ); \mathbb {R}^{+})$ such that for a.e. $t \ge 0$ and for all $x \in \mathbb {R}^{n},\; u \in U(t)$
$$\begin{aligned} |{\tilde{f}}(t,x,u)| + |{\tilde{F}}(t,x,u)| \le c(t)(1+|x|). \end{aligned}$$
A3
For a.e. $t \ge 0$ and all $x \in \mathbb {R}^{n}$, the set-valued map
$$\begin{aligned} y \rightarrow \{({\tilde{f}}(t,y,u), {\tilde{F}}(t,y,u)): u\in U(t)\} \end{aligned}$$
is continuous with closed images, and the set
$$\begin{aligned} \{({\tilde{f}}(t,x,u), {\tilde{F}}(t,x,u)+r): u \in U(t),\; r\ge 0 \} \end{aligned}$$
is convex.
A4
There exists a function $k \in L^{1}_{loc}([0,\infty ); \mathbb {R}^{+})$ such that for a.e. $t\ge 0$ and for all $x,y\in \mathbb {R}^{n},\; u \in U(t)$: $|{\tilde{f}}(t,x,u) - {\tilde{f}}(t,y,u)| + |{\tilde{F}}(t,x,u)-{\tilde{F}}(t,y,u)| \le k(t)|x-y|.$
A5
$k\in {\mathcal {L}}_{loc}$ and $\limsup _{t\rightarrow \infty } \frac{1}{t}\int ^{t}_{0}(c(s)+k(s))ds < \infty .$
A6
There exists a function $q\in {\mathcal {L}}_{loc}$ such that for a.e. $t \ge 0$
$$\begin{aligned} \sup \limits _{u\in U(t)}(|{\tilde{f}}(t,x,u)|+|{\tilde{F}}(t,x,u)|) \le q(t),\; \text {for all } x\in \partial C, \end{aligned}$$
where $C \subset \mathbb {R}^{n}$ is closed.
A7
Let $V: [0,\infty )\times C \rightarrow \mathbb {R}\cup \{\pm \infty \}$ be the value function of the respective HJB equation with endpoint conditions. Then assume Dom $V \ne \varnothing $ and there exist $T > 0$ and $\psi \in L^{1}([T,\infty ); \mathbb {R}^{+})$ such that for all $(t_{0}, x_{0}) \in \text {Dom }V \cap ([T,\infty )\times \mathbb {R}^{n})$ and any feasible trajectory-control pair $(x(\cdot ), u(\cdot ))$ on $[t_{0}, \infty ),$ with $x(t_{0}) = x_{0},$
$$\begin{aligned} |{\tilde{F}}(t, x(t),u(t))| \le \psi (t)\text { for a.e. } t\ge t_{0}. \end{aligned}$$
A8
There exist $\eta ,\; r > 0$ and $M \ge 0$ such that for a.e. $t > 0$ and any $y\in \partial C + \eta B_{1}({{\textbf {0}}}),$ and any $v \in {\tilde{f}}(t,y,U(t)),$ with $\inf \limits _{n\in N^{1}_{y,\eta }}\langle n,v \rangle \le 0,$ we can find $w \in {\tilde{f}}(t,y,U(t))\cap B_{M}(v)$ satisfying
$$\begin{aligned} \inf _{n\in N^{1}_{y,\eta }}\{\langle n,w\rangle ,\; \langle n, w-v \rangle \}\ge r, \end{aligned}$$
where $N^{1}_{y,\eta }:= \{n \in \partial B_{1}({{\textbf {0}}}): n \in {\overline{co}}N_{C}(x),\; x \in \partial C\cap B_{\eta }(y)\}.$

Proposition 2

(Theorem 3.3, [2]) Let assumptions A1-A8 be satisfied and let $W:[0,\infty ) \times C \rightarrow \mathbb {R}\cup \{+\infty \}$ be a lower semicontinuous function such that $\text {Dom }V(t,\cdot )\subset \text {Dom }W(t,\cdot ) \ne \emptyset $ for all large $t > 0$ and

$$\begin{aligned} \lim _{t\rightarrow \infty }\sup \limits _{y\in \text {Dom }W(t,\cdot )}|W(t,y)| = 0. \end{aligned}$$

(36)

Then the following statements are equivalent:

1.
W = V;
2.
W is a weak (or viscosity) solution of HJB equation on $(0,\infty )\times C$ and $t \rightarrow \text {epi }W(t,\cdot )$ is locally absolutely continuous.

In addition, V is the unique weak solution satisfying (36) with locally absolutely continuous $t \rightarrow \text {epi }V(t,\cdot )$.

If we consider ${\tilde{F}}(t,x,u) = {\tilde{l}}(t,x,u)e^{-\lambda t}$ for $\lambda > 0$ and ${\tilde{l}}$ bounded, this result holds with V as the unique locally Lipschitz continuous solution [2, Prop 5.7].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Weston, J., Tolić, D. & Palunko, I. Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control. J Optim Theory Appl 200, 437–462 (2024). https://doi.org/10.1007/s10957-023-02364-4

Download citation

Received: 16 April 2022
Accepted: 28 November 2023
Published: 09 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10957-023-02364-4

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Application of Hamilton–Jacobi–Bellman Equation/Pontryagin’s Principle for Constrained Optimal Control

Abstract

Similar content being viewed by others

UAV Path Planning Using Optimization Approaches: A Survey

A Real-time Fuzzy Interacting Multiple-Model Velocity Obstacle Avoidance Approach for Unmanned Aerial Vehicles

Intelligent fractional-order sliding mode control based maneuvering of an autonomous vehicle

1 Introduction

2 Preliminaries

Definition 2.1

Definition 2.2

2.1 Motivations for Using DIDO

3 Problem Formulation

4 Methodology

4.1 Application to State-Constrained Problem

Remark 4.1

Remark 4.2

4.2 Weak Solutions to HJB

Theorem 4.1

Proof

Remark 4.3

5 Solving the Original Problem

5.1 Convergence to Original Infinite-Horizon Problem

Proposition 1

Proof

Remark 5.1

Remark 5.2

5.2 Approximation of Original Finite-Time Problem

6 Examples

6.1 Optimal Path Planning for a LQR Type Problem with Constraints

6.2 Optimal Path Planning for a Fully Nonlinear Model of an AUV in 3D with Obstacles

6.3 Optimal Path Planning for a Fully Nonlinear Model of an UAV in 3D with Obstacles

7 Conclusions and Future Work

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation