DRL-PP
Deep reinforcement learning based path planning and collision avoidance for smart ships in complex environments
Introduction
Background
- Intelligence in shipbuilding and shipping is a key trend enabling high-quality development in the post-pandemic era
- Main components
- Autonomous navigation
- Automatic collision avoidance
- Energy management systems
- Main components
Significance
Intelligent navigation is a vital smart ship technology realized through advanced automation and navigation systems
Safety
、efficiency
、cost
- optimizing ship speeds and routes
- ensuring navigation safety
- reducing fuel consumption and emissions
Traditional methods
- have limitations for path planning problems in random, complex environments which involve difficult-to-quantify factors like the environment and contingent uncertainties
- Deep reinforcement learning demonstrates better performance with abstract, difficult-to-quantify influences compared to traditional approaches
Work basis
Gao P, Zhou L, Zhao X, Shao B, Research on ship collision avoidance path planning based on modified potential field ant colony algorithm [J]. Ocean and Coastal Management,2023,235(3): 106482. https://doi.org/10.1016/j.ocecoaman.2023.106482.*
Problem Description
Ship collision avoidance problem description
- Ship
- own ship
- target ship / mutli-ships
- obstacle (static、dynamic)
- Collision avoidance
- rules
- state
- action
- Path planning
- objective
- constraints
- method
- AIS data
- rules
Ship domain modeling based on AIS and rules
Grid Method
Encounter Situation Classification
Level of ship collision risk
Model and algorithm
Markov Decision Process
State Space
Action Space
Reward function
Total_reward
Return
D3QN algorithm
Dueling Double DQN algorithm pseudocode
to
Adaptive decay greedy strategy
Simulation Experiment
Parameter setting
- Ship information
Vessel | Name | Type | Size (m) | Tonnage (t) |
---|---|---|---|---|
Own ship | Hang Xing817 | Bulk cargo ship | 87-14.8-5.1 | 2114 |
Target ship | Zhou Gong6006 | Bulk cargo ship | 67.8-16.0-5.2 | 2138 |
- Hyperparameter
Hyperparameter | Value |
---|---|
Episode | 3000 |
Learning rate | 1-e4 |
Batch size | 256 |
Target network update frequency | 3000 |
Replay buffer | 100000 |
Skit ratio | 0. 02 |
PER | 0. 6 |
PER | 0. 4 |
Warm start | 50 |
Discount factor | 0.99 |
Collision avoidance experiments
Head-on
Crossing
Overtaking
Reward
Conclusion
- For ship collision avoidance path planning in dynamic environments, large state and complex action spaces arise due to uncertainties.
- Based on AIS data and COLREGs, this paper designs rewards evaluating multiple rule-compliant behaviors. An adaptive attenuated greedy exploration strategy is introduced based on the prioritized experience replay D3QN algorithm. Experiments under various collision avoidance scenarios demonstrate that the proposed method achieves superior results.
- Further considerations / extensions
- Multi-Obj / different reward
- Complexity: uncertainty、dynamic、multi-ships (communication)
- Title: DRL-PP
- Author: Murphy Lee
- Created at : 2023-09-19 17:09:16
- Updated at : 2023-10-25 16:34:53
- Link: https://redefine.ohevan.com/2023/09/19/DRLPP/
- License: This work is licensed under CC BY-NC-SA 4.0.
Comments