DRL-PP

DRL-PP

Murphy Lee Lv2

Deep reinforcement learning based path planning and collision avoidance for smart ships in complex environments

Introduction

Background

  • Intelligence in shipbuilding and shipping is a key trend enabling high-quality development in the post-pandemic era
    • Main components
      • Autonomous navigation
      • Automatic collision avoidance
      • Energy management systems

Significance

  • Intelligent navigation is a vital smart ship technology realized through advanced automation and navigation systems

    • Safetyefficiencycost
      • optimizing ship speeds and routes
      • ensuring navigation safety
      • reducing fuel consumption and emissions
  • Traditional methods

    • have limitations for path planning problems in random, complex environments which involve difficult-to-quantify factors like the environment and contingent uncertainties
    • Deep reinforcement learning demonstrates better performance with abstract, difficult-to-quantify influences compared to traditional approaches
  • Work basis

    Gao P, Zhou L, Zhao X, Shao B, Research on ship collision avoidance path planning based on modified potential field ant colony algorithm [J]. Ocean and Coastal Management,2023,235(3): 106482. https://doi.org/10.1016/j.ocecoaman.2023.106482.*

Problem Description

Ship collision avoidance problem description

  • Ship
    • own ship
    • target ship / mutli-ships
    • obstacle (static、dynamic)
  • Collision avoidance
    • rules
    • state
    • action
  • Path planning
    • objective
    • constraints
    • method
    • AIS data
    • rules

Ship domain modeling based on AIS and rules

  • Grid Method

    Grid Method.png

  • Encounter Situation Classification

    Untitled

  • Level of ship collision risk

    CRL.png

Model and algorithm

Markov Decision Process

  • State Space

    CRL.png

  • Action Space

    CRL.png

  • Reward function

    • CRL.png
    • CRL.png
    • CRL.png
    • CRL.png
    • CRL.png
  • Total_reward

    CRL.png

  • Return

    CRL.png

D3QN algorithm

  • Dueling Double DQN algorithm pseudocode

    CRL.png

  • to

    CRL.png

  • Adaptive decay greedy strategy

    CRL.png

Simulation Experiment

Parameter setting

  • Ship information
Vessel Name Type Size (m) Tonnage (t)
Own ship Hang Xing817 Bulk cargo ship 87-14.8-5.1 2114
Target ship Zhou Gong6006 Bulk cargo ship 67.8-16.0-5.2 2138
  • Hyperparameter
Hyperparameter Value
Episode 3000
Learning rate 1-e4
Batch size 256
Target network update frequency 3000
Replay buffer 100000
Skit ratio 0. 02
PER 0. 6
PER 0. 4
Warm start 50
Discount factor 0.99

Collision avoidance experiments

  • Head-on

    Head-on.png

  • Crossing

    Crossing.png

  • Overtaking

    Overtaking.png

  • Reward

    reward.png

Conclusion

  • For ship collision avoidance path planning in dynamic environments, large state and complex action spaces arise due to uncertainties.
  • Based on AIS data and COLREGs, this paper designs rewards evaluating multiple rule-compliant behaviors. An adaptive attenuated greedy exploration strategy is introduced based on the prioritized experience replay D3QN algorithm. Experiments under various collision avoidance scenarios demonstrate that the proposed method achieves superior results.
  • Further considerations / extensions
    • Multi-Obj / different reward
    • Complexity: uncertainty、dynamic、multi-ships (communication)
  • Title: DRL-PP
  • Author: Murphy Lee
  • Created at : 2023-09-19 17:09:16
  • Updated at : 2023-10-25 16:34:53
  • Link: https://redefine.ohevan.com/2023/09/19/DRLPP/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments