1 Introduction

Autonomous vessels are becoming increasingly important due to crew shortages, cost reduction, and safety considerations. In particular, the automation and autonomy of berthing maneuvers are significant issues because berthing maneuvering is one of the most stressful maneuvers seafarers undertake.

Numerous studies have been conducted on the automation of docking and berthing operations for surface vessels [1,2,3,4,5,6,7,8,9,10,11,12,13]. For a solution to the berthing control problem to be useful in practical situations, the solution must be generated in real time. This problem is challenging due to the computational constraints and the complexity and uncertainty of maneuverability. Several studies have tackled this problem in two stages: planning and tracking/following. For instance, Shouji et al. [1, 2] discussed the optimal control problem for automatic berthing and proposed a numerical solution. Additionally, they proposed a feedback controller for tracking the solution. Ahmed et al. [3] proposed an artificial neural network controller for berthing. Their research incorporated the utilization of a reference line, referred to as an imaginary line, to plan the berthing maneuver. Bitar et al. [4] proposed an optimization-based trajectory planner for docking operation and used a dynamic positioning controller for trajectory tracking, and Martinsen et al. [5] addressed the presence of unmapped objects by using range sensors. Sawada et al. [6] proposed a berthing system that applied a path-following and path-planning algorithm.

We focus on the approach using trajectory planning and trajectory tracking. Here, in this paper, a trajectory is defined as a time series of state variables, which are subject to spatial and temporal constraints that take into account the vessel dynamics. The generation of a trajectory is called trajectory planning, and the tracking of a given trajectory is called trajectory tracking. In particular, the desired trajectory of berthing maneuvering is called the berthing trajectory. By preparing the berthing trajectory in advance, it is possible to solve the berthing control problem in real time only via tracking the prepared trajectory.

However, tracking the predefined berthing trajectory is not straightforward because the vessel speed changes significantly, the effect of wind disturbance increases at low-speed, and complex maneuvers, including backward and crabbing motion, are frequently required. Therefore, an essential task is to develop a tracking controller that can safely track a given berthing trajectory.

1.1 Related research

Considerable research has been conducted on trajectory tracking control of surface vessels [14,15,16,17,18,19,20,21]. For instance, there exists much research work on the application of nonlinear model predictive control (NMPC) to trajectory tracking control [17, 20], and the development of tracking control based on backstepping control methods for robust control in situations where uncertainties due to disturbances exist [16, 18, 19]. Studies have also been undertaken on dynamic positioning control, focusing on situations where the vessel speed is near zero [15].

Research on adopting reinforcement learning (RL) to surface vessel control has recently increased [22,23,24,25,26,27,28]. RL is an area of machine learning and can be used to find the optimal action maximizing a cumulative reward obtained in an environment containing uncertainty [29]. Although RL requires training, RL does not require real-time optimization computation because it uses function approximation via a neural network (NN). Martinsen et al. implemented an algorithm based on the deep deterministic policy gradient method to obtain a controller that minimizes the tracking errors in straight and curved paths in situations involving unknown currents [23, 24]. Subsequently, Martinsen et al. proposed an RL-based control scheme for trajectory tracking control that achieves both DP control and path following [25]. Moreover, Martinsen et al. extended the scheme to improve the tracking performance using NMPC [28].

However, berthing trajectories typically consist of complex motions such as turning, stopping the ship, backward motion, spinning, and crabbing motion. Therefore, training for trajectories that consist of any combination of motions is required.

Additionally, in berthing maneuvers, the avoidance of collisions is also required. This is because a small tracking error can cause a collision in berthing maneuvers because the purpose of the berthing is to bring the vessel close to the desired berth, which in itself represents an obstacle. Moreover, it is impossible to keep tracking errors equal to zero in an uncertain environment.

Some studies on collision avoidance of surface vessels have already been undertaken [26, 27]. However, those studies treated path following and collision avoidance as dual objective problems. In this case, path following and collision avoidance are in a trade-off relationship. The vessel may not be able to reach the berthing point due to avoiding a collision with the berth. Therefore, in berthing maneuvers, it is effective to avoid tracking errors that increase the probability of collisions.

1.2 Scope of this study

The purpose of this study is to develop a methodology that avoids the tracking errors that may cause collisions with static obstacles. This paper proposes a training method for a trajectory tracking controller that reduces the probability of collision with static obstacles. The main contributions of this paper are as follows:

  1. 1.

    The generation of random trajectories generated from maneuvering simulation was introduced to permit trajectory tracking controllers to track complex trajectories.

  2. 2.

    The development of a method to generate static pseudo-obstacles depending on the desired trajectory was proposed for training on the avoidance of collisions with obstacles.

  3. 3.

    A measure representing the distance between a static obstacle and the vessel position was introduced to give the geometric information describing the obstacle to the controller.

  4. 4.

    A new reward function was proposed to train the trajectory tracking controller to prioritize the avoidance of the variety of tracking errors that may cause a collision.

  5. 5.

    The potential of the proposed method to reduce the probability of collisions in berthing maneuvering was demonstrated via numerical simulations. Furthermore, the results of the model experiments conducted outdoors are shown.

The proposed method requires a training environment including a maneuvering model and response characteristics of steering systems. However, it is unnecessary to prepare a large number of berthing trajectories and obstacle information for training; only berthing trajectory and obstacle information in the target harbor is required.

The remainder of the paper is organized as follows: Sect. 2 describes the training method of the trajectory tracking controller. Section 3 describes how the trajectory tracking controller can be applied to berthing maneuvers. Section 4 reports and analyzes the numerical simulation and model experiment results of berthing maneuvering. Finally, Sect. 5 presents the conclusions of the study.

2 Methods

The subject ship is a 3-meter model ship shown in Fig. 1. L and B denote the length and breadth of this model ship and their values listed in Table 1. This ship is equipped with a single propeller, a VecTwin rudder, and a bow thruster. VecTwin rudder system [30] is a rudder system that has a pair of fishtail rudders and enables various maneuvers. The VecTwin rudder system significantly enhances ship maneuverability [31] and enables the development of low-speed positioning systems [32]. In this paper, the rudders of the VecTwin rudder system are called as VecTwin rudder.

Fig. 1
figure 1

Subject ship in Inukai pond

Table 1 Particular of the model ship

The actuator states is defined as \({\varvec{u}} \equiv \left( \delta _{\textrm{p}}, \delta _{\textrm{s}}, n_{\textrm{p}}, n_{\textrm{bt}} \right) ^{{\textsf{T}}} \in {\mathbb {R}}^4\), where \(\delta _{\textrm{p}}\) represents the rudder angle on the port side, \(\delta _{\textrm{s}}\) represents the rudder angle on the starboard side, \(n_{\textrm{p}}\) is the propeller revolution number, and \(n_{\textrm{bt}}\) is the bow thruster revolution number. In this study, the response characteristics of the steering rudder are taken into account. The control commands are defined as \({\varvec{u}}_{\textrm{cmd}} \equiv \left( \delta _{\textrm{p},\textrm{cmd}}, \delta _{\textrm{s},\textrm{cmd}}, n_{\textrm{p},\textrm{cmd}}, n_{\textrm{bt},\textrm{cmd}} \right) ^{\textsf{T}} \in {\mathbb {R}}^4\). The upper and lower limits of the actuator state variables are defined in Table 2. Note that the propeller revolution number is constant at \(10 \ \textrm{rps}\) because the VecTwin rudder system enables various motions by changing the rudder angle while maintaining a constant propeller revolution number. This revolution number \(10 \ \textrm{rps}\) allows the vessel to reach the typical vessel speed of harbor maneuvers on a real scale.

Table 2 Limits of the actuator state variables
Fig. 2
figure 2

The coordinates systems used in this work

In this study, the motion of the vessel in the harbor is modeled by a surge–sway–yaw three-degrees-freedom equation of motion. The coordinate systems are earth-fixed coordinates, represented by \(\textrm{o}_{0}-x_{0}y_{0}{z_{0}}\) and ship-fixed coordinates, represented by \(\textrm{o}-xy{z}\), which has its origin on midship. The coordinate systems are shown in Fig. 2. \(\psi \), u, \(v_{\textrm{m}}\), and r denotes the heading angle, the surge velocity, the sway velocity of the midship, and the yaw angular velocity, respectively. Pose vector of the vessel is defined as \(\varvec{\eta } \equiv \left( x_{0}, y_{0}, \psi \right) ^{\textsf{T}} \in {\mathbb {R}}^{3}\), and velocity vector is defined as \({\varvec{v}} \equiv \left( u, v_{\textrm{m}}, r\right) ^{\textsf{T}} \in {\mathbb {R}}^{3}\). The vessel state vector is defined as \({\varvec{x}} \equiv \left( \varvec{\eta }^{\textsf{T}}, {\varvec{v}}^{\textsf{T}} \right) ^{\textsf{T}} \in {\mathbb {R}}^6\). The observed vessel state vector is defined as \(\hat{{\varvec{x}}} \in {\mathbb {R}}^6\). The vector of true wind speed and direction is defined as \({\varvec{w}} \equiv \left( U_{\textrm{T}}, \gamma _{\textrm{T}}\right) ^{\textsf{T}} \in {\mathbb {R}}^{2}\), where \(U_{\textrm{T}}\) represents the true wind speed, and \(\gamma _{\textrm{T}}\) is the true wind direction. The true wind speed and direction are assumed to depend on time but not space.

2.1 Optimization of the trajectory tracking controller

In this study, the trajectory tracking problem of minimizing the tracking error is formulated as a maximization problem related to a reward. This section describes the formulation used here.

This paper focuses on the trajectory tracking of the pose vector, \(\varvec{\eta }\). Here, the desired pose vector is defined as \({\varvec{r}} \in {\mathbb {R}}^{3}\). The time series of desired pose vector can be represented as follows:

$$\begin{aligned} {\mathscr {R}} = \left\{ {\varvec{r}}_{k} \right\} _{k = 0, \ldots , N-1} \hspace{5.0pt}, \end{aligned}$$
(1)

where k denotes discrete-time steps, and N is the total number of steps in the desired trajectory. The time of k-th step is defined as \(t_{k}\), and the time step between \(t_{k}\) and \(t_{k+1}\) is defined as \(\Delta t\). The time step between successive decisions in the RL algorithm is also equal to \(\Delta t\).

In this study, the geometric information related to static obstacles is assumed to be given. The static obstacles are represented by multiple polygons which can be defined,

$$\begin{aligned} {\mathscr {O}} = \bigcup _{k=1}^{N_{\textrm{obs}}}{S_{i}} \hspace{5.0pt}, \end{aligned}$$
(2)

where \(S_{i}\) represents the i-th obstacle polygon, and \(N_{\textrm{obs}}\) represents the total number of obstacle polygons.

In RL, the state variable for decision-making is often represented by \({\varvec{s}}\) and the action variables by \({\varvec{a}}\); these conventions is adopted in this work. The trajectory tracking controller makes decisions depending on the observed vessel state, \(\hat{{\varvec{x}}}\), the actuator state, \({\varvec{u}}\), the desired trajectory, \({\mathscr {R}}\), and the static obstacle, \({\mathscr {O}}\). Thus, the state variable and the action variables at k-th step are defined as,

$$\begin{aligned} \left\{ \, \begin{aligned} {\varvec{s}}_{k}&= {\varvec{f}}_{s}\left( \hat{{\varvec{x}}}_{k}, {\varvec{u}}_{k}, {\mathscr {R}}, {\mathscr {O}}\right) \\ {\varvec{a}}_{k}&= {\varvec{u}}_{\textrm{cmd},k} \hspace{5.0pt}, \\ \end{aligned} \right. \end{aligned}$$
(3)

where \({\varvec{f}}_{s}\) is a function that generates the input of the controller; this function is described in Sect. 2.3. The action can then be determined as \({\varvec{a}}_{k} = \varvec{\mu }({\varvec{s}}_{k})\), where \(\varvec{\mu }\) is the policy function. This function is the trajectory tracking controller and is represented by a NN.

This study considers trajectory tracking as an episodic task in RL. A time series of the state and action variables throughout the episode is defined as \(\tau _{} = \left( {\varvec{s}}_{0}, {\varvec{a}}_{0}, \ldots , {\varvec{s}}_{N-1}, {\varvec{a}}_{N-1}, {\varvec{s}}_{N}\right) \). Let’s assume that \(\tau _{}\) is distributed according to a probability distribution \(\rho ^{\varvec{\mu }_{}}\). The distribution \(\rho ^{\varvec{\mu }_{}}\) is defined as follows:

$$\begin{aligned} \rho ^{\varvec{\mu }} = p({\varvec{s}}_{0}) \prod ^{T-1}_{k=0}p({\varvec{s}}_{k+1}|{\varvec{s}}_{k},{\varvec{a}}_{k}) \varvec{\mu }\left( {\varvec{s}}_{k}\right) \hspace{5.0pt}, \end{aligned}$$
(4)

where \(p({\varvec{s}}_{0})\) represents the probability distribution of the initial state, \({\varvec{s}}_{0}\), observed from the environment; \(p({\varvec{s}}_{k+1}|{\varvec{s}}_{k},{\varvec{a}}_{k})\) represents the state transition probability distribution. These distributions were implemented in the numerical simulations presented in this work and are described in detail in Sect. 2.2. The objective function described in Eq. 5 represents the expected cumulative reward obtained in an uncertain environment. Then, an objective function with a finite-time horizon can be defined as,

$$\begin{aligned} J(\varvec{\mu }_{}) = E_{\tau _{} \sim \rho ^{\varvec{\mu }_{}}}\left[ \sum ^{N-1}_{k=0} \gamma ^{k} {R}\left( t_{k}, {\varvec{s}}_{k}, {\varvec{a}}_{k}\right) \right] \hspace{5.0pt}, \end{aligned}$$
(5)

where \(E_{\tau _{} \sim \rho ^{\varvec{\mu }_{}}}\left[ \cdot \right] \) represents the expectation value inside the bracket when \(\tau _{}\) is generated according to the distribution \(\rho ^{\varvec{\mu }_{}}\), \(\gamma \) is the discount rate, and \({R}(\cdot )\) is the reward function. The discount rate is a hyper-parameter that implies the idea of discounting the reward to be obtained in the distant future. The reward function \({R}(\cdot )\) is described in detail in Sect. 2.4.

This objective function means the cumulative rewards obtained in an episode of the training environment. The trajectory tracking controller that maximizes this objective function is obtained by RL.

In the field of artificial intelligence, various RL algorithms have been developed [29]. In the optimization problem of this paper, the action space is continuous, and deterministic policy functions are used. Therefore, the RL algorithms called deep deterministic policy gradient (DDPG) [33, 34] and twin-delayed deep deterministic policy gradient (TD3) [35] can be applied; TD3 is an algorithm that improves on DDPG and was used in this study.

2.2 Training environment

This section describes the training environment used in this work. In training, the desired trajectory and the static obstacle are generated for each episode. After that, maneuvering simulations were conducted. The methods used for the generation of the desired trajectory and static obstacles are described in Sect. 2.2.1. The environment of maneuvering simulation is described in detail in Sect. 2.2.2.

2.2.1 Desired trajectory and pseudo-obstacle

In training, the desired trajectories, \({\mathscr {R}}\), are generated stochastically for each episode, and the static pseudo-obstacles, \({\mathscr {O}}\), are generated depending on the desired trajectories. This section describes the generation methods used for these variables.

The manner in which the desired trajectories were generated is described here. The target of this research is to train the trajectory tracking controller for berthing maneuvers. It is reasonable to train on a set of desired berthing trajectories in the target harbor. However, collecting many berthing trajectories is a time-consuming task. Moreover, training based on a small number of specific trajectories is undesirable because it may be necessary to change the desired trajectory due to practical considerations. Therefore, in this paper, trajectories obtained from a maneuvering simulation subject to random control input are introduced.

The trajectories generated for training are required to include complex motions such as turning and backward motion. In this maneuvering simulation, control inputs were determined randomly at each time step of the simulation. The distribution followed by control inputs was determined so that the time average of the thrusts generated from actuators was around zero. Thus, the bow thruster revolution number was determined based on the uniform distribution over the interval listed in Table 2. The port and starboard side rudder angles were determined based on the normal distributions of \({\mathscr {N}}(75, 30^2)\) and \({\mathscr {N}}(-75, 30^2)\), respectively. Here, the mean values of these distributions were set to take the value at which the net forces and the moment approached nearly zero.

The initial state of the maneuvering simulation was determined randomly. The initial vessel velocity was determined based on the uniform distribution, whose interval is listed in Table 3. The selection of this interval was based on the typical vessel speed of harbor maneuvers on a real scale. The initial vessel pose was set to zero. In this maneuvering simulation, wind disturbance was neglected.

Table 3 Initial velocity interval of the uniform distribution used to generate the desired trajectory
Fig. 3
figure 3

The generation method of static pseudo-obstacles used in this work

The generation method of the static obstacle is described. The static obstacles used in the training procedure were automatically generated according to the desired trajectory. Here, we impose that the desired trajectory does not contact any obstacle. We also defined that the generated obstacles consist of line segments that are all longer than the vessel length. A schematic of the proposed scheme is presented in Fig. 3 and can be summarized in the following steps:

  1. 1.

    A grid is set based on \(x_{0}y_{0}\) plane. Here, one of the grid points is set to coincide with the origin of the coordinate system used. The grid size of both the x-axis and y-axis is set to be 2L, and the number of grid elements was set such that it covered the region where the desired trajectory exists. The distance of the generated pseudo-obstacle from the desired trajectory can be controlled by the grid size. If the grid size is small, the distance between the obstacle and trajectory is close. Therefore, the grid size 2L was determined so that the environment is not too severe.

  2. 2.

    The area where obstacles must not exist is then determined according to the desired trajectory. It is assumed that no obstacles are generated in the area through which the vessel passes when the motion of the desired trajectory is performed. Here, the vessel shape was represented by an ellipse whose semi-major axis was 0.75L and semi-minor axis was B. Additionally, no obstacles would be generated in a circular area of radius 1.9L centered at the origin of the coordinate system. The circular area is introduced to avoid collisions arising from initial tracking errors. Considering the initial tracking errors listed in Table 4, no obstacle should be generated in the four grids tangent to the origin of the coordinate system. The radius of the circular area was determined to be 1.9L so that no obstructions are generated in the four grids.

  3. 3.

    All grids through which the vessel does not pass are defined as static obstacles.

An example of the generated trajectory and pseudo-obstacles is shown in Fig. 4.

Fig. 4
figure 4

An example of a desired trajectory with the static pseudo-obstacles used in training

2.2.2 Maneuvering simulation

The simulation environment was implemented considering a low-speed maneuvering model, steering response characteristics, wind disturbance, and artificial noise; the inclusion of all these elements led to a realistic environment. This section describes the models used to generate each of these elements in detail.

First, the maneuvering model used in this study is described. The mathematical maneuvering group (MMG) model [36] for low-speed maneuvering was used. The submodels that constitute the used MMG model are described in the following. Yoshimura’s unified model [37] was used for hull hydrodynamic forces. The resistance coefficient and linear hydrodynamic derivatives were determined through captive model tests, while the remaining coefficients were estimated using empirical formulae [37]. For rudder and propeller forces, Kang’s model [38] was used. The thrust coefficients of the propeller were obtained from the propeller open test, and for the remaining coefficients, the VLCC’s coefficients given by Kang [38] were applied. The thruster force was modeled based on Kobayashi’s model [39]. The thrust coefficient was obtained experimentally. The relation between the reduction of thruster force and ship speed was represented by a second-order polynomial function with respect to the Froude number, and the coefficients were determined through captive model tests. For wind forces, Fujiwara’s regression model [40] was used. The coefficients of this model were determined from the hull and superstructure geometry of the subject ship. The used MMG model is denoted as follows:

$$\begin{aligned} \dot{{\varvec{x}}} = {\varvec{f}}_{\textrm{MMG}}\left( {\varvec{x}}, {\varvec{u}}, {\varvec{w}}\right) \hspace{5.0pt}, \end{aligned}$$
(6)

where the overdot, \(\dot{ }\), denotes the derivative with respect to time, t.

Second, the response characteristics of the actuator state, \({\varvec{u}}\), are described. The rudder steering system with which the subject model ship was equipped has the characteristic of approaching a command value at a constant speed. Therefore, the response characteristics of rudder steering were modeled using the following expressions,

$$\begin{aligned} \left\{ \, \begin{aligned} \dot{\delta }_{\textrm{p}}&= K \cdot \text {sign}\left( \delta _{\textrm{p},\textrm{cmd}} - \delta _{\textrm{p}} \right) \\ \dot{\delta }_{\textrm{s}}&= K \cdot \text {sign}\left( \delta _{\textrm{s},\textrm{cmd}} - \delta _{\textrm{s}} \right) \hspace{5.0pt}, \\ \end{aligned} \right. \end{aligned}$$
(7)

where K is the rudder steering speed and is determined to be \(20 \ \mathrm {deg./s}\). This value is taken based on measurements of the actual model ship system. The response characteristics of the bow thruster revolution numbers were neglected. The bow thruster revolution numbers are given as \(n_{\textrm{bt}} = n_{\textrm{bt}, \textrm{cmd}}\).

The wind process was generated using the method proposed by Maki et al. [41]. In this method, one-dimensional filter equation using the wind speed spectrum approximated via Hino’s spectrum [42] is numerically solved using Euler-Maruyama’s method. This method can generate the wind process if the time-averaged true wind speed and direction are given. Here, the time-averaged true wind speed and direction are defined as \(\bar{U}_{\textrm{T}}\) and \(\gamma _{\textrm{T}}\), respectively. These variables are stochastically determined for each episode.

The maneuvering model described in Eq. 6 was solved numerically using the Runge–Kutta method, and the response characteristics described in Eq. 7 were solved analytically. Here, the time step of the numerical solutions is defined as \(\Delta t_{\textrm{sim}}\); this time step was set to be \(0.1 \ \textrm{s}\), which is shorter than the time step \(\Delta t\). Here, the initial value of the vessel state is given such that the tracking error with the desired trajectory follows a uniform distribution over the interval listed in Table 4. The initial value of the actuator state is given as \((0.0, \ 0.0, \ 0.0, \ 0.0)\).

Table 4 Uniform distribution intervals of the initial tracking error from the berthing trajectory

Furthermore, process noise was added to the MMG model, and observation noise was added to the parameter describing the vessel state. The process noise and the observation noise were defined to follow a normal distribution, and their covariance matrices are represented by \(\varvec{\Sigma }_{\textrm{sys}} \in {\mathbb {R}}^{6\times 6}\) and \(\varvec{\Sigma }_{\textrm{obs}} \in {\mathbb {R}}^{6\times 6}\), respectively. The covariance matrix of the observed noise was determined by the nominal observation accuracy, which was described in the equipment specifications.

2.3 Input of the controller

This section describes the function \({\varvec{f}}_{s}\) used in Eq. 3. The trajectory tracking controller makes decisions depending on the vessel state, tracking error, and geometric relationship between the vessel and the obstacles. In the following, these elements determining the state, \({\varvec{s}}\), are described.

The tracking error is defined as the error between the vessel pose vector, \(\varvec{\eta }\), and the desired state, \({\varvec{r}}\); this value is based on ship-fixed coordinate system \(\textrm{o}-xy\), and is represented as follows:

$$\begin{aligned} {\varvec{e}}_{i,j}= \begin{bmatrix} \cos \psi &{}\quad \sin \psi &{}\quad 0 \\ -\sin \psi &{}\quad \cos \psi &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 \\ \end{bmatrix} \left( {\varvec{r}}_{i}-\varvec{\eta }_{j}\right) \hspace{5.0pt}, \end{aligned}$$
(8)

where i and j represent the time step of the desired state and the vessel pose, respectively. The geometric relationship between vessel and obstacle is defined as the error between vessel positions and the obstacle point nearest to the desired positions; this parameter can be written as,

$$\begin{aligned} \tilde{{\varvec{e}}}_{i,j}= \begin{bmatrix} \cos \psi &{}\quad \sin \psi \\ -\sin \psi &{}\quad \cos \psi \\ \end{bmatrix} \left( {\varvec{o}}_{\textrm{near},i}-\tilde{\varvec{\eta }}_{j}\right) \hspace{5.0pt}, \end{aligned}$$
(9)

where \(\tilde{\varvec{\eta }}_{j}\) is defined as the position component of \(\varvec{\eta }_{j}\) and \({\varvec{o}}_{\textrm{near},i}\) is defined as,

$$\begin{aligned} {\varvec{o}}_{\textrm{near},i} = \mathop {\text {argmin}}\limits _{{\varvec{o}} \in {\mathscr {O}}} \Vert {\varvec{o}} - \tilde{{\varvec{r}}}_{i}\Vert \hspace{5.0pt}, \end{aligned}$$
(10)

where \(\tilde{{\varvec{r}}}_{i}\) denotes the position component of \({\varvec{r}}_{i}\).

In this study, \({\varvec{s}}_{k}\) is defined to consist of the tracking error, \({\varvec{e}}_{i,j}\), the distance to the static obstacle, \(\tilde{{\varvec{e}}}_{i,j}\), the vessel velocity, \({\varvec{v}}_{k}\) and the actuator state, \({\varvec{u}}_{k}\). \({\varvec{s}}_{k}\) is thus expressed as,

$$\begin{aligned} \begin{aligned} {\varvec{s}}_{k} =&\left( {\varvec{e}}_{k,k}^{\textsf{T}}, \ {\varvec{e}}_{k+\frac{T_{1}}{\Delta t},k}^{\textsf{T}}, \ {\varvec{e}}_{k+\frac{T_{2}}{\Delta t},k}^{\textsf{T}}, \ \ldots , \ \right. \\&\left. \ \ \tilde{{\varvec{e}}}_{k,k}^{\textsf{T}}, \ \tilde{{\varvec{e}}}_{k+\frac{T_{1}}{\Delta t},k}^{\textsf{T}}, \ \tilde{{\varvec{e}}}_{k+\frac{T_{2}}{\Delta t},k}^{\textsf{T}}, \ \ldots , \ {\varvec{v}}_{k}^{\textsf{T}}, \ {\varvec{u}}_{k}^{\textsf{T}} \right) ^{\textsf{T}} \hspace{5.0pt}, \end{aligned} \end{aligned}$$
(11)

where \(T_{1}, T_{2}, \ldots \) represent arbitrary time steps that are assumed to be positive multiples of \(\Delta t\). Thus, this study also considers the error between the current vessel state and the future desired state as an input of the controller. Therefore, the function \({\varvec{f}}_{s}\) is defined via the use of Eqs. 811

2.4 Reward function

This section describes the reward function used in this study. The reward function was designed to obtain a trajectory tracking controller that minimizes tracking errors and controls effort. Furthermore, we propose a method to preferentially minimize tracking errors that may lead to a collision with static obstacles.

One measure of tracking error is the error norm of the vessel pose state, \(\varvec{\eta }\), and desired state, \({\varvec{r}}\). However, this measure requires the time-consuming task of adjusting various weights due to the units being different for the information regarding the position and heading angles. Therefore, the measures of the tracking error were defined based on the error of bow and stern positions that can be expressed as,

$$\begin{aligned} \left\{ \begin{aligned} e_{\textrm{bow},k}&= \left\| {\varvec{g}}_{\textrm{bow}}(\varvec{\eta }_{k}) - {\varvec{g}}_{\textrm{bow}}({\varvec{r}}_{k}) \right\| \\ e_{\textrm{stern},k}&= \left\| {\varvec{g}}_{\textrm{stern}}(\varvec{\eta }_{k}) - {\varvec{g}}_{\textrm{stern}}({\varvec{r}}_{k}) \right\| \hspace{5.0pt}, \end{aligned}\right. \end{aligned}$$
(12)

where \({\varvec{g}}_{\textrm{bow}} \in {\mathbb {R}}^{2}\) and \({\varvec{g}}_{\textrm{stern}} \in {\mathbb {R}}^{2}\) denote the bow and stern positions, respectively; these variables are defined as,

$$\begin{aligned} \left\{ \begin{aligned} {\varvec{g}}_{\textrm{bow}}(\varvec{\eta })&= \left( x_{0}+\frac{1}{2}L\cos {\psi }, \ y_{0}+\frac{1}{2}L\sin {\psi }\right) ^{\textsf{T}} \\ {\varvec{g}}_{\textrm{stern}}(\varvec{\eta })&= \left( x_{0}-\frac{1}{2}L\cos {\psi }, \ y_{0}-\frac{1}{2}L\sin {\psi }\right) ^{\textsf{T}} \hspace{5.0pt}. \end{aligned}\right. \end{aligned}$$
(13)
Fig. 5
figure 5

Illustration of the relationship between the vessel and static obstacle

We also define the measure of the distance between the vessel and the static obstacles. Since the purpose of berthing is to get the vessel close to the berth, a measure that prevents the vessel from approaching the berthing point is undesirable. For this reason, a measure that has no trade-off between reducing tracking errors and avoiding the collision with the obstacles is defined here.

In this study, we introduced an imaginary obstacle line shown in Fig. 5; this line passes through \({\varvec{o}}_{\textrm{near},k}\), and the angle of inclination of this line is equal to the desired heading angle. The normal vector on the \(x_{0}y_{0}\) plane orthogonal to the imaginary obstacle line was also introduced. This normal vector is defined as \({\varvec{n}}_{k} \in {\mathbb {R}}^2\), and can be calculated as according to,

$$\begin{aligned} \begin{bmatrix} {\varvec{n}}_{k}\\ 0 \end{bmatrix} = \begin{bmatrix} {\varvec{g}}_{\textrm{bow}}({\varvec{r}}_{k}) - {\varvec{g}}_{\textrm{stern}}({\varvec{r}}_{k})\\ 0 \end{bmatrix} \times {\varvec{e}}_{z} \hspace{5.0pt}, \end{aligned}$$
(14)

where \({\varvec{e}}_{z}\) is a unit vector orthogonal to the \(x_{0}y_{0}\) plane, and this vector was \((0, 0, 1)^{\textsf{T}}\). Here, the distance between the position of the desired state and the imaginary obstacle line was expressed as,

$$\begin{aligned} l_{k} = \left| \frac{{\varvec{n}}_{k} \cdot \left( \tilde{{\varvec{r}}}_{k} - {\varvec{o}}_{\textrm{near},k}\right) }{\left\| {\varvec{n}}_{k}\right\| } \right| \hspace{5.0pt}. \end{aligned}$$
(15)

Besides, the distances from the actual bow and stern positions to the imaginary obstacle line were expressed as follows:

$$\begin{aligned} \left\{ \begin{aligned} l_{\textrm{bow},k}&= \left| \frac{{\varvec{n}}_{k} \cdot \left( {\varvec{g}}_{\textrm{bow}}(\varvec{\eta }_{k}) - {\varvec{o}}_{\textrm{near},k}\right) }{\left\| {\varvec{n}}_{k}\right\| } \right| \\ l_{\textrm{stern},k}&= \left| \frac{{\varvec{n}}_{k} \cdot \left( {\varvec{g}}_{\textrm{stern}}(\varvec{\eta }_{k}) - {\varvec{o}}_{\textrm{near},k}\right) }{\left\| {\varvec{n}}_{k}\right\| } \right| \hspace{5.0pt}, \end{aligned}\right. \end{aligned}$$
(16)

Then, the indicators that increase as the vessel approaches the obstacles compared to the desired position can be expressed as \(l_{k} - l_{\textrm{bow},k}\) and \(l_{k} - l_{\textrm{stern},k}\).

Thus, the measures of nearness between the vessel and the static obstacles based on the desired position were defined as follows:

$$\begin{aligned} \left\{ \begin{aligned} c_{\textrm{bow},k}&= \max \left\{ 0, \frac{l_{k} - l_{\textrm{bow},k}}{l_{k}}\right\} \\ c_{\textrm{stern},k}&= \max \left\{ 0, \frac{l_{k} - l_{\textrm{stern},k}}{l_{k}}\right\} \hspace{5.0pt}. \end{aligned}\right. \end{aligned}$$
(17)

Note that the cases where the vessel position is farther from the obstacles than the desired position were ignored since we focused on only reducing the tracking errors that increase the probability of collisions.

Moreover, the tolerances of the measures described in Eqs. 12 and 17 were introduced. These tolerances are defined as,

$$\begin{aligned} \left\{ \begin{aligned} e_{\textrm{tol},k}&= \left( e_{0} - e_{\infty } \right) \exp {\left( - b_{\textrm{e}} t_{k} \right) } + e_{\infty } \\ c_{\textrm{tol},k}&= \left( c_{0} - c_{\infty } \right) \exp {\left( - b_{\textrm{c}} t_{k} \right) } + c_{\infty } \hspace{5.0pt}, \end{aligned}\right. \end{aligned}$$
(18)

where \(e_{0}\) and \(c_{0}\) are the initial tolerances, \(e_{\infty }\) and \(c_{\infty }\) are the tolerances when the time of the episode elapses infinitely, and \(b_{\textrm{e}}\) and \(b_{\textrm{c}}\) are the attenuation factor of the tolerances.

If the measures given in Eqs. 12 and 17 exceed these tolerances, no further rewards are given and the episode is stopped. In other words, the episode is terminated if one of the following inequalities is not satisfied:

$$\begin{aligned} \left\{ \begin{aligned} e_{\textrm{bow},k}&< e_{\textrm{tol},k} \\ e_{\textrm{stern},k}&< e_{\textrm{tol},k} \\ c_{\textrm{bow},k}&< c_{\textrm{tol},k} \\ c_{\textrm{stern},k}&< c_{\textrm{tol},k} \hspace{5.0pt}. \end{aligned}\right. \end{aligned}$$
(19)

If a collision occurs, the episode is terminated. The collision detection area was considered to be equivalent to the area defined in Sect. 2.2.1. If this area contacts an obstacle, the episode is terminated.

The reward function can thus be defined as follows:

$$\begin{aligned} \begin{aligned}&{R}\left( t_{k}, {\varvec{s}}_{k}, {\varvec{a}}_{k}, {\mathscr {R}}, {\mathscr {O}}\right) = \frac{e_{\textrm{tol},k} - e_{\textrm{bow},k}}{e_{\textrm{tol},k}} + \frac{e_{\textrm{tol},k} - e_{\textrm{stern},k}}{e_{\textrm{tol},k}} \\ {}&\quad + \frac{c_{\textrm{tol},k} - c_{\textrm{bow},k}}{c_{\textrm{tol},k}} + \frac{c_{\textrm{tol},k} - c_{\textrm{stern},k}}{c_{\textrm{tol},k}} - \lambda \sum _{i=1}^{4} \left( \frac{u_{\textrm{cmd},i} - u_{\textrm{c},i}}{u_{\textrm{std},i}} \right) ^2, \end{aligned} \end{aligned}$$
(20)

where \(\lambda \) is a positive constant value, \(u_{\textrm{c},i}\) represents the control input with the lowest control effort, and \(u_{\textrm{std},i}\) is a constant that normalizes each variable. Here, if \(\lambda \) is large, the reward becomes negative. The controller may obtain more rewards when the episode is terminated early in such a case. For this reason, \(\lambda \) is assumed to be a small value , and we defined as \(\lambda =1/300\) in numerical experiments.

3 Application to berthing maneuvers

In this paper, the proposed methods were evaluated in terms of tracking the berthing trajectory. This section describes the generation method of the berthing trajectories and the development of tracking control methods for the berthing trajectories. The target harbor of this study is Inukai pond, which exists at Osaka University. This port geometry was shown in Fig. 6.

Fig. 6
figure 6

Port geometry of experimental pond

3.1 Trajectory planning

In this study, the desired trajectories for berthing maneuvers were generated according to the trajectory planning method proposed by Miyauchi et al. [10]. In this algorithm, the trajectory planning of the berthing was modeled as a time-minimizing problem, and the spatial constraint and the terminal condition are embedded in the objective function. This problem is solved by the framework of optimal control theory [43, 44] with the use of a covariance matrix adaptation evolution strategy (CMA-ES) (see, for example, Ref. [45]).

Here, the obtained trajectory is defined as,

$$\begin{aligned} {\mathscr {R}}^{\prime } = \left\{ {\varvec{r}}^{\prime }_{{k^\prime }} \right\} _{{k^\prime } = 0, \ldots , N^{\prime }-1} \hspace{5.0pt}, \end{aligned}$$
(21)

where \(k^\prime \) is discrete-time steps, and \(N^{\prime }\) is the total number of time steps. Note that the new subscript \(k^\prime \) is introduced since the time step of the obtained trajectory does not necessarily equal that of the desired trajectory generated in training. The former is determined equally with the decision-making time step of the RL algorithm, while the latter is determined by trajectory planning. The time at which the \(k^\prime \)-th step occurs is defined as \(t_{k^\prime }\), and the time step between \(t_{k^\prime }\) and \(t_{k^\prime +1}\) is defined as \(\Delta t^{\prime }\).

We prepared 44 trajectories with 4 different terminal conditions and 11 different initial conditions; these conditions are listed in Table 5. The target poses of terminal conditions were defined at a point 1.5B away from the berth, and have different heading angles. The vessel pose of the initial condition was set at a distance of about 8L from the target poses. The vessel speed of the initial condition was determined as the speed which can be reduced sufficiently before reaching the target point. The tolerances of the terminal condition are determined based on the previous paper [10] In the trajectory planning, the limits of actuator commands were changed from those listed in Table 2 to generate the trajectory with enough margins of control forces. The limit of the actuator states used is shown in Table 6. This idea has been proposed by Kose et al. [46]. The obtained trajectories are shown in Fig. 7.

Table 5 Initial and terminal conditions, and tolerances used in trajectory planning
Table 6 Limit of actuator state variables used in trajectory planning
Fig. 7
figure 7

Desired trajectories generated by the trajectory planning procedure

3.2 Selection of the desired state

We describe a selection method of the desired pose vector from the desired trajectory. In trajectory tracking, the desired state is selected according to the desired trajectory and does not depend on the current state. When there is a considerable error between the current state and the desired state, the velocity of the vessel deviates significantly from the velocity assumed on the desired trajectory in order to reduce the error. Consequently, this may lead to an undesired increase in vessel speed. In a berthing maneuver, the tracking control must not exceed the desired vessel speed. This is because the vessel may be unable to stop at the berthing point from a higher vessel speed.

Therefore, the desired pose vector was chosen from the desired trajectory according to the vessel position. The distance between the current state \(\varvec{\eta }_{k}\) and a desired state \({\varvec{r}}^{\prime }_{k^{\prime }}\) are calculated as follows:

$$\begin{aligned} L\left( k^{\prime }, k\right) = \left( {\varvec{r}}^{\prime }_{k^{\prime }} - \varvec{\eta }_{k} \right) ^{\textsf{T}} W \left( {\varvec{r}}^{\prime }_{k^{\prime }} - \varvec{\eta }_{k} \right) \hspace{5.0pt}, \end{aligned}$$
(22)

where \(W = \textrm{diag}\left( 1, 1, 0\right) \). Thus, the error of the heading angle is ignored. Let’s assume that \({\varvec{r}}^{\prime }_{i_{k}}\) denotes the nearest state in the desired trajectory from the current state \(\varvec{\eta }_{k}\). Then, \(i_{k}\) is selected by

$$\begin{aligned} i_{k} = \mathop {\text {argmin}}\limits _{k^{\prime } \in K_{k}} \left\{ L\left( k^{\prime }, k\right) \right\} \hspace{5.0pt}, \end{aligned}$$
(23)

where \(K_{k}\) denotes the search range. This range is determined as follows:

$$\begin{aligned} K_{k} \equiv {\left\{ \begin{array}{ll} \left\{ 0, 1, \ldots , N^{\prime }-1 \right\} &{} (k = 0)\\ \left\{ i_{k-1}+1, \ldots , i_{k-1}+I \right\} &{} (k > 0) \end{array}\right. } \hspace{5.0pt}, \end{aligned}$$
(24)

where I denotes a constant and should be defined to avoid shortcuts when desired trajectories intersect. In summary, the desired pose vectors of the control input are selected as follows:

$$\begin{aligned} \left\{ \begin{aligned} {\varvec{r}}_{k}&= {\varvec{r}}^{\prime }_{i_{k}} \\ {\varvec{r}}_{k + \frac{T_{1}}{\Delta t}}&= {\varvec{r}}^{\prime }_{i_{k} + \frac{T_{1}}{\Delta t^{\prime }}} \\ {\varvec{r}}_{k + \frac{T_{2}}{\Delta t}}&= {\varvec{r}}^{\prime }_{i_{k} + \frac{T_{2}}{\Delta t^{\prime }}} \\&\vdots \quad \quad \quad . \\ \end{aligned}\right. \end{aligned}$$
(25)

Note that \(T_{1}, T_{2}, \ldots \) must also be positive multiples of \(\Delta t^{\prime }\).

3.3 Pseudo-obstacles in the target harbor

Tracking performance is degraded in situations that are different from those experienced in training due to the characteristics of NNs. In other words, if the input of controller \({\varvec{s}}_{k}\) that has not occurred in training is entered into the policy function represented by the NNs, the output of the policy function is unpredictable.

In the target harbor, a situation may be encountered that did not occur in the training. For instance, in the target harbor, the value of \(\tilde{{\varvec{e}}}_{k,k}, \tilde{{\varvec{e}}}_{k+\frac{T_{1}}{\Delta t},k}, \tilde{{\varvec{e}}}_{k+\frac{T_{2}}{\Delta t},k}, \ldots \) included in Eq. 11 is larger than that in training. This is because the obstacles of the target harbor have a larger space than the pseudo-obstacles generated in training. Therefore, a pseudo-obstacle was also generated in the target harbor to prevent tracking performance degradation.

The method described in Sect. 2.2.1 would lead to the generation of pseudo-obstacles that would significantly affect berthing maneuvering. Therefore, the ellipse area that was used to represent the vessel shape was changed to a circular area whose radius is 1.9L. Since generated pseudo-obstacles are dummies, collisions with pseudo-obstacles are ignored. An example generated by the method described here is shown in Fig. 8.

Fig. 8
figure 8

A sample of the generated pseudo-obstacles in the target harbor

4 Results

This section presents the training results of the proposed method and the results related to the berthing maneuver. We present the tracking results of the berthing trajectories obtained from both simulation and model experiments.

In this study, for comparison of proposed methods, we prepared a trajectory tracking controller minimizing the tracking error and control effort in the absence of static obstacles. Here, \(\tilde{{\varvec{e}}}_{k,k}, \tilde{{\varvec{e}}}_{k+\frac{T_{1}}{\Delta t},k}, \tilde{{\varvec{e}}}_{k+\frac{T_{2}}{\Delta t},k}, \ldots \) included in Eq. 11 are set to zero. The third and fourth terms of the proposed reward function described in Eq. 20 is also set to zero. This trajectory tracking controller is referred to as Ctrl-w/o-OBST, whereas the trajectory tracking controller obtained by the proposed method was Ctrl-w/-OBST.

The training results of Ctrl-w/o-OBST and Ctrl-w/-OBST are shown in Sect. 4.1, and the tracking results of the berthing trajectories using Ctrl-w/o-OBST and Ctrl-w/-OBST are shown in Sect. 4.2.

4.1 Training results of trajectory tracking controller

This section shows the training result of the proposed method. Ctrl-w/-OBST and Ctrl-w/o-OBST were trained five times each. The training was undertaken for a total of \(3.0\times 10^7\,\textrm{s}\) of simulation time.

The parameters used during training are summarized here. The environmental parameters are listed in Table 7, and the parameters describing the tracking controller and the NNs are listed in Tables 8 and 9, respectively. Here, the output dimension of the policy function is different from the dimension of the actuator command because the propeller revolution number is constant. The parameters of the objective function and the reward function are listed in Table 10. Here, the values of \(e_{0}\) are set to be almost twice as large as the initial tracking error. The values of \(b_{\textrm{e}}\) and \(b_{\textrm{c}}\) are set such that the allowed range is halved after 50 s. Moreover, the hyperparameters of TD3 were the same values as those listed in Table 3 of [35] except batch size, and the batch size was 512.

The parameters of the NNs acquired during the training were saved and then evaluated for all parameters of the NNs. In the evaluation, 20 episodes were simulated in the same environment as was used in the training episodes, and the average cumulative rewards were calculated. The average cumulative rewards and the 95% confidence interval for the five training are shown in Fig. 9. We see that the results do not deviate significantly from the \(95\%\) confidence interval. Therefore, the parameter with the highest cumulative reward among the five training was selected as being the most appropriate method. Note that the cumulative rewards of Ctrl-w/-OBST and Ctrl-w/o-OBST cannot be compared directly as the reward functions are different.

Table 7 Parameters describing the training environment
Table 8 Numerical values used for the parameters present in Eq. 11
Table 9 Details of the NNs architecture
Table 10 Numerical values used for the parameters present in Eqs. 5 and 20
Fig. 9
figure 9

Average cumulative reward of the 20 episodes used for the evaluation

4.2 Tracking results related to the berthing trajectories

This section shows the tracking results related to berthing trajectories. Here, a shorter decision-making interval for the evaluation was set to \(\Delta t = 1.0 \ \textrm{s}\), the time step of the berthing trajectory was set to \(\Delta t^{\prime } = 0.2 \ \textrm{s}\). The constant I included in Eq. 24 is determined as \(I = 5\). The simulation and model experiment results are shown in the following sections.

4.2.1 Simulation

The tracking result of Ctrl-w/-OBST and Ctrl-w/o-OBST on the berthing trajectories were compared in simulation. Simulations were terminated when a collision occurred or when the elapsed time of the simulation reached \(250 \ \textrm{s}\). The collision detection was conducted using the detection area defined in Sect. 2.2.1.

The tracking results in no wind conditions are shown in Figs. 10, 11, 12 and 13. Here, the initial error of y-axis was \(3.0 \ \textrm{m}\), initial conditions of berthing trajectories were \(\varvec{\eta }_{\textrm{init},0}\).

In the case of Fig. 10, both controllers were able to absorb the initial error and track the desired trajectories. However, the simulation using Ctrl-w/o-OBST was terminated due to collision. The collision occurred around the target pose nearest to the berth, and the error in the heading angle increased just before the collision. The bow thruster generates a starboard side force to reduce y-axis error, and rudder angles were selected to generate astern force to reduce ship speed. In particular, the upper and lower limits of rudder angles (\(\delta _{\textrm{p}}=-105 \ \mathrm {deg.}, \delta _{\textrm{s}}=105 \ \mathrm {deg.}\)) were continued to be selected, and the y-axis force was not controlled. As a result, the moment generated by the bow thruster was not canceled by rudders, and the moment caused the error in the heading angle. On the other hand, Ctrl-w/-OBST terminated the simulation without a collision, and the error of the heading angle was kept small.

In the case of Fig. 12, the simulation using Ctrl-w/o-OBST was terminated due to collision. In this case, although the controller got the sway and yaw velocity close to zero, the collision occurred before the velocities reached zero. In other words, the small overshoot of \(x_{0}\) and \(\psi \) occurred and caused the collision. Therefore, in berthing maneuvers, collisions occur due to small tracking errors even if the performance of the tracking controller is high. Besides, in the cases of Figs. 11 and 13, both controllers were able to track the berthing trajectory, and the simulation was completed without collision.

Fig. 10
figure 10

Tracking results of the berthing trajectory in no wind condition. The terminal condition of the berthing trajectory is \(\varvec{\eta }_{\textrm{des},1}\)

Fig. 11
figure 11

Tracking results of the berthing trajectory in no wind condition. The terminal condition of the berthing trajectory is \(\varvec{\eta }_{\textrm{des},2}\)

Fig. 12
figure 12

Tracking results of the berthing trajectory in no wind condition. The terminal condition of the berthing trajectory is \(\varvec{\eta }_{\textrm{des},3}\)

Fig. 13
figure 13

Tracking results of the berthing trajectory in no wind condition. The terminal condition of the berthing trajectory is \(\varvec{\eta }_{\textrm{des},4}\)

Fig. 14
figure 14

A tracking result of the berthing trajectory using the Ctrl-w/-OBST. The terminal condition of the desired trajectory is \(\varvec{\eta }_{\textrm{des},2}\), and mean wind speed, \(\bar{U}_{\textrm{T}}\), is \(0.0 \ \mathrm {m/s}\)

However, the occurrence of collisions can vary due to factors such as berthing trajectories, wind disturbances, and initial tracking errors. Therefore, we compare Ctrl-w/-OBST and Ctrl-w/o-OBST in terms of collision probability during the berthing maneuvers to demonstrate the effectiveness of Ctrl-w/-OBST. The collision probability is defined as the probability of colliding with a static obstacle before the end of a given simulation.

The collision probability was computed by fixing the terminal condition of the berthing trajectory and the mean wind speed, and varying the initial condition of the berthing trajectory, the initial tracking error, and the mean wind direction. In other words, 100 trials of tracking control for a berthing trajectory were conducted by changing the initial tracking error and the mean wind direction, and the collision probability was derived by summing the results from 11 initial conditions of the berthing trajectory. The initial tracking error is given to follow a uniform distribution over the intervals listed in the Table 4, and the mean wind direction and noise are determined as shown in Table 7. Due to the potential issue of having an insufficient sample size, collision probabilities were calculated five times, and the mean and standard deviation for each were calculated. The mean values and standard deviations of the collision probabilities are listed in Table 11. Here, considering the wind pressure and the upper limit of the thrust that the subject ship can generate in bollard conditions, the collision probabilities were computed with mean wind speeds below \(1.5 \ \mathrm {m/s}\).

In the case where the Ctrl-w/o-OBST was used, we see that the collision probability was relatively high in the berthing trajectories whose terminal conditions are \(\varvec{\eta }_{\textrm{des},1}\) or \(\varvec{\eta }_{\textrm{des},3}\). Collision probabilities were high even at low mean wind speeds because small tracking errors caused collisions, as shown in Figs. 10b and 12b.

In the case where the Ctrl-w/-OBST was used, we see that the collision probability is relatively high in the berthing trajectories whose terminal conditions are \(\varvec{\eta }_{\textrm{des},2}\). One of the reasons for this is that the collision detection area of the bow side contacts an obstacle when turning as shown in Fig. 14. A slight tracking error causes a collision during turning because these berthing trajectories are generated considering the collision detection area described in Sect. 2.2.1. The collision probabilities were therefore recalculated with a semi-major axis of the collision detection ellipse of 0.5L. The obtained collision probabilities are listed in Table 12. It was found under these conditions that the collision probability is similar to those obtained for the other terminal conditions.

Comparing the collision probabilities of Ctrl-w/-OBST and Ctrl-w/o-OBST, we can see that the collision probabilities of Ctrl-w/-OBST are smaller than those of Ctrl-w/o-OBST, except when the terminal condition is \(\varvec{\eta }_{\textrm{des},2}\). However, in the case where the terminal condition is \(\varvec{\eta }_{\textrm{des},2}\), the collision probability for Ctrl-w/-OBST is comparable to those for other terminal conditions. Therefore, penalizing tracking errors that lead to collisions with obstacles in the reward function reduces the collision probability in the tracking control of the berthing trajectory.

Table 11 The mean values and standard deviations of the collision probabilities
Table 12 The mean values and standard deviations of the collision probabilities
Fig. 15
figure 15

A result of model experiments tracking the berthing trajectory using the Ctrl-w/-OBST algorithm. The terminal condition of the desired trajectory was \(\varvec{\eta }_{\textrm{des},1}\), and wind conditions were \(\bar{U}_{\textrm{T}}=1.19 \ \mathrm {(m/s)}, \bar{\gamma }_{\textrm{T}}= 0.69\pi \ \mathrm {(rad)}\)

Fig. 16
figure 16

A result of model experiments tracking the berthing trajectory using the Ctrl-w/-OBST algorithm. The terminal condition of the desired trajectory was \(\varvec{\eta }_{\textrm{des},2}\), and wind conditions were \(\bar{U}_{\textrm{T}}=0.60 \ \mathrm {(m/s)}, \bar{\gamma }_{\textrm{T}}=-0.62\pi \ \mathrm {(rad)}\)

4.2.2 Model experiment

The Ctrl-w/-OBST was evaluated via model experiments. The model experiment was carried out at Inukai pond of Osaka University. The measurement equipment of the model ship closely resembled those employed in a previous study [47]. In summary, \(x_{0}\) and \(y_{0}\) were measured through Global Navigation Satellite System (GNSS). \(\psi \) and r were measured by Fiber Optical Gyro (FOG). u and \(v_{\textrm{m}}\) are calculated based on the speed over ground, the course over ground, and the heading angle, which were measured by GNSS and FOG, respectively. The berthing trajectories with terminal conditions of \(\varvec{\eta }_{\textrm{des},1}\) and \(\varvec{\eta }_{\textrm{des},2}\) were used. The tracking results related to these berthing trajectories are shown in Figs. 15 and 16, respectively.

In the case of Fig. 15, the model ship was able to track the berthing trajectory during the approach phase before \(t=100 \ \textrm{s}\). On the other hand, after \(t=100 \ \textrm{s}\), the model ship deviated from the berthing trajectory. However, it can be observed that the Ctrl-w/-OBST aims to bring the model ship closer to the desired trajectory by generating an astern force with the VecTwin rudder and a negative sway force with the bow thruster. Therefore, this tracking error can be attributed to an increase in wind speed.

In this case, the model ship reached the berthing point without collision. However, the probability of a collision would increase if the wind direction was different. Therefore, in this wind speed, a decision to stop the berthing maneuvers is required. Further research into the wind speed that permits safe berthing maneuvers is required.

Figure 16 shows that the Ctrl-w/-OBST could track a berthing trajectory, including turning, and reduce the vessel speed to a value close to the allowable velocity, \({\varvec{v}}_{\textrm{tol}}\), near the berthing point. In this case, the wind conditions were relatively calm.

5 Discussion

Simulation results of Sect. 4.2.1 show that the trained controllers were able to track berthing trajectories. This means that the proposed reward function is effective for the tracking controller, and randomly generated trajectories during training were effective enough for tracking controllers to track the prepared berthing trajectories.

The proposed method reduced the collision probability. Therefore, the proposed reward function and the generation method of static pseudo-obstacles are effective to reduce tracking errors that may cause a collision. However, the collision probability remains too high to be applicable to practical use. In this study, wind disturbance has not been integrated into the controller feedback input nor considered in the trajectory planning. In order to improve the tracking performance and reduce the collision probability of berthing operations, one of our future tasks is to take wind into account in the tracking controller and trajectory planning.

In practical use, the wind disturbance is not controllable. If the force of wind disturbance exceeds the maximum force that the actuators can generate, berthing operations will be difficult no matter how the actuators are controlled. Therefore, another future task is to investigate the threshold of the wind disturbance that makes berthing operations difficult.

Furthermore, to calculate the collision probability, it was evaluated whether a collision would occur before the elapsed time reached \(250 \ \textrm{s}\). The effect of fenders was not taken into account. For instance, collisions below a certain speed may be acceptable due to the presence of fenders. Thus, more detailed studies with a definition of successful berthing operations.

In Sect. 4.2.2, the trained controllers were able to track berthing trajectories even in model experiments, and in some cases were able to do so without collision. This study simplifies the modeling by using models proposed in previous studies. Specifically, although the hydrodynamic forces induced by the VecTwin rudder are complex, the model of the VecTwin rudder was a simplified model proposed in previous studies [38]. The process noise was introduced into the environment to compensate for the simplicity of the model. As a result, the trained controllers were able to track berthing trajectories in model experiments. However, Further study of the model to improve tracking performance is one of our future tasks.

In the proposed method, the controller was trained using obstacles consisting of line segments longer than the vessel length. However, actual harbors may have more complex shapes. Therefore, research related to training methods for obstacles consisting of shorter line segments is necessary.

6 Conclusion

This study proposed a training method based on RL for a trajectory tracking controller that reduces the probability of collision with static obstacles. The proposed training method is required to prepare the training environment including the maneuvering model of the subject ship. However, the desired trajectories for the tracking controller were randomly generated from a maneuvering simulation, and the static pseudo-obstacles were generated automatically. Therefore, it is not essential to prepare a large number of desired trajectories and obstacle information for training.

Moreover, this paper summarized the application of the obtained trajectory tracking controller to berthing maneuvers. The target harbor was the Inukai pond of Osaka University, and the berthing trajectories were generated by the trajectory planning method based on a time-minimizing problem. The desired state for the trained controllers was selected according to the berthing trajectories and the current state. Pseudo-obstacles were introduced at the target harbor because the obstacles of the target harbor have a larger space than the pseudo-obstacles generated in training.

To demonstrate the effectiveness of the proposed method, this paper showed the results of both simulations and model experiments related to the tracking of berthing trajectories in a target harbor. In the simulation environment, the trained controllers were able to track berthing trajectories, and the proposed method reduced the collision probability in berthing maneuvers. Furthermore, the trained controllers were able to track berthing trajectories without collisions, even in model experiments.