留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习与扩展卡尔曼滤波相结合的交通信号灯配时方法

吴兰 吴元明 孔凡士 李斌全

吴兰, 吴元明, 孔凡士, 等 . 基于深度强化学习与扩展卡尔曼滤波相结合的交通信号灯配时方法[J]. 北京航空航天大学学报, 2022, 48(8): 1353-1363. doi: 10.13700/j.bh.1001-5965.2021.0529
引用本文: 吴兰, 吴元明, 孔凡士, 等 . 基于深度强化学习与扩展卡尔曼滤波相结合的交通信号灯配时方法[J]. 北京航空航天大学学报, 2022, 48(8): 1353-1363. doi: 10.13700/j.bh.1001-5965.2021.0529
WU Lan, WU Yuanming, KONG Fanshi, et al. Traffic signal timing method based on deep reinforcement learning and extended Kalman filter[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1353-1363. doi: 10.13700/j.bh.1001-5965.2021.0529(in Chinese)
Citation: WU Lan, WU Yuanming, KONG Fanshi, et al. Traffic signal timing method based on deep reinforcement learning and extended Kalman filter[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1353-1363. doi: 10.13700/j.bh.1001-5965.2021.0529(in Chinese)

基于深度强化学习与扩展卡尔曼滤波相结合的交通信号灯配时方法

doi: 10.13700/j.bh.1001-5965.2021.0529
基金项目: 

国家自然科学基金 61973103

河南省软科学研究计划 212400410005

详细信息
    通讯作者:

    吴兰, E-mail: wulan@haut.edu.cn

  • 中图分类号: V221+.3;TB553

Traffic signal timing method based on deep reinforcement learning and extended Kalman filter

Funds: 

National Natural Science Foundation of China 61973103

Soft Science Research Program of Henan Province 212400410005

More Information
  • 摘要:

    深度Q学习网络(DQN)因具有强大的感知能力和决策能力而成为解决交通信号灯配时问题的有效方法,然而外部环境扰动和内部参数波动等原因导致的参数不确定性问题限制了其在交通信号灯配时系统领域的进一步发展。基于此,提出了一种DQN与扩展卡尔曼滤波(EKF)相结合(DQN-EKF)的交通信号灯配时方法。以估计网络的不确定性参数值作为状态变量,包含不确定性参数的目标网络值作为观测变量,结合过程噪声、包含不确定性参数的估计网络值和系统观测噪声构造EKF系统方程,通过EKF的迭代更新求解,得到DQN模型中的最优真实参数估计值,解决DQN模型中的参数不确定性问题。实验结果表明:DQN-EKF配时方法适用于不同的交通环境,并能够有效提高车辆的通行效率。

     

  • 图 1  模型总体框架

    Figure 1.  Overall framework of the model

    图 2  交通状态信息表示

    Figure 2.  Traffic state information representation

    图 3  十字路口相位变化

    Figure 3.  Phase change diagram of crossroads

    图 4  丁字路口相位变化

    Figure 4.  Phase change diagram of T-junction

    图 5  DQN与EKF相结合的结构

    Figure 5.  Structure of combination of DQN and EKF

    图 6  场景1中正常交通流下的平均等待时间变化

    Figure 6.  Variation of average waiting time under normal traffic flow in Scenario 1

    图 7  场景1中正常交通流下的平均队列长度

    Figure 7.  Average queue length under normal traffic flow in Scenario 1

    图 8  场景1中高峰交通流下的平均等待时间变化

    Figure 8.  Variation of average waiting time under peak traffic flow in Scenario 1

    图 9  场景1中高峰交通流下的平均队列长度

    Figure 9.  Average queue length under peak traffic flow in Scenario 1

    图 10  场景2中正常交通流下的平均等待时间变化

    Figure 10.  Variation of average waiting time under normal traffic flow in Scenario 2

    图 11  场景2中正常交通流下的平均队列长度

    Figure 11.  Average queue length under normal traffic flow in Scenario 2

    图 12  场景2中高峰交通流下的平均等待时间变化

    Figure 12.  Variation of average waiting time under peak traffic flow in Scenario 2

    图 13  场景2中高峰交通流下的平均队列长度

    Figure 13.  Average queue length under peak traffic flow in Scenario 2

    图 14  场景3中正常交通流下的平均等待时间变化

    Figure 14.  Variation of average waiting time under normal traffic flow in Scenario 3

    图 15  场景3中正常交通流下的平均队列长度

    Figure 15.  Average queue length under normal traffic flow in Scenario 3

    图 16  场景3中高峰交通流下的平均等待时间变化

    Figure 16.  Variation of average waiting time under peak traffic flow in Scenario 3

    图 17  场景3中高峰交通流下的平均队列长度

    Figure 17.  Average queue length under peak traffic flow in Scenario 3

    表  1  参数设置

    Table  1.   Parameter setting

    参数 数值
    批量大小B 32
    衰减因子γ 0.9
    学习率α 0.001
    经验池大小N 10 000
    探索率ε 1.0→0.01
    下载: 导出CSV

    表  2  场景1中正常交通流下的算法性能对比

    Table  2.   Comparison of algorithm performance under normal traffic flow in Scenario 1

    算法 平均等待时间/s 平均队列长度/m
    DQN 33.70 36.88
    Dueling DQN 30.87 34.31
    Double DQN 29.02 31.47
    DQN-EKF 25.81 30.87
    下载: 导出CSV

    表  3  场景1中高峰交通流下的算法性能对比

    Table  3.   Comparison of algorithm performance under peak traffic flow in Scenario 1

    算法 平均等待时间/s 平均队列长度/m
    DQN 36.81 50.36
    Dueling DQN 35.16 47.68
    Double DQN 32.92 45.04
    DQN-EKF 28.63 40.17
    下载: 导出CSV

    表  4  场景2中正常交通流下的算法性能对比

    Table  4.   Comparison of algorithm performance under normal traffic flow in Scenario 2

    算法 平均等待时间/s 平均队列长度/m
    DQN 36.03 45.74
    Dueling DQN 34.28 40.81
    Double DQN 31.75 37.33
    DQN-EKF 27.46 33.28
    下载: 导出CSV

    表  5  场景2中高峰交通流下的算法性能对比

    Table  5.   Comparison of algorithm performance under peak traffic flow in Scenario 2

    算法 平均等待时间/s 平均队列长度/m
    DQN 37.98 55.42
    Dueling DQN 35.84 50.84
    Double DQN 33.70 48.23
    DQN-EKF 30.38 44.19
    下载: 导出CSV

    表  6  场景3中正常交通流下的算法性能对比

    Table  6.   Comparison of algorithm performance under normal traffic flow in Scenario 3

    算法 平均等待时间/s 平均队列长度/m
    DQN 26.68 32.18
    Dueling DQN 24.54 28.44
    Double DQN 22.50 27.29
    DQN-EKF 19.96 22.77
    下载: 导出CSV

    表  7  场景3中高峰交通流下的算法性能对比

    Table  7.   Comparison of algorithm performance under peak traffic flow in Scenario 3

    算法 平均等待时间/s 平均队列长度/m
    DQN 32.62 44.36
    Dueling DQN 29.90 39.78
    Double DQN 26.59 37.64
    DQN-EKF 23.66 32.11
    下载: 导出CSV
  • [1] ROBERTSON D I, BRETHERTON R D. Optimizing networks of traffic signals in real time-The SCOOT method[J]. IEEE Transactions on Vehicular Technology, 1991, 40(1): 11-15. doi: 10.1109/25.69966
    [2] LOWRIE P R. SCATS: The Sydney coordinated adaptive traffic system principles, methodology, algorithms[C]//International Conference on Road Traffic Signalling, 1982.
    [3] 王史春. 基于BP模糊神经网络的交通信号灯控制器设计[J]. 云南民族大学学报(自然科学版), 2011, 20(6): 511-514. doi: 10.3969/j.issn.1672-8513.2011.06.018

    WANG S C. Design of traffic signal controller based on BP fuzzy neural network[J]. Journal of Yunnan University for Nationalities(Natural Science Edition), 2011, 20(6): 511-514(in Chinese). doi: 10.3969/j.issn.1672-8513.2011.06.018
    [4] 胡智鹏. 基于遗传算法的一种改进交叉路口信号灯实时控制优化方法[J]. 山东工业技术, 2015, 206(24): 110-111. https://www.cnki.com.cn/Article/CJFDTOTAL-SDGJ201524102.htm

    HU Z P. An improved optimization method for real-time control of intersection lights based on genetic algorithm[J]. Shandong Industrial Technology, 2015, 206(24): 110-111(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SDGJ201524102.htm
    [5] BOUDERBA S I, MOUSSA N. Reinforcement learning (Q-LEARNING) traffic light controller within intersection traffic system[C]//Proceedings of the 4th International Conference on Big Data and Internet of Things. New York: ACM, 2019: 1-6.
    [6] BUSCH J, LATZKO V, REISSLEIN M, et al. Optimised traffic light management through reinforcement learning: Traffic state agnostic agent vs. holistic agent with current V2I traffic state knowledge[J]. IEEE Open Journal of Intelligent Transportation Systems, 2020, 1: 201-216. doi: 10.1109/OJITS.2020.3027518
    [7] GUO J, HARMATI I. Comparison of game theoretical strategy and reinforcement learning in traffic light control[J]. Periodica Polytechnica Transportation Engineering, 2020, 48(4): 313-319. doi: 10.3311/PPtr.15923
    [8] GARG D, CHLI M, VOGIATZIS G. Deep reinforcement learning for autonomous traffic light control[C]//2018 3rd IEEE International Conference on Intelligent Transportation Engineering (ICITE). Piscataway: IEEE Press, 2018: 214-218.
    [9] GENDERS W, RAZAVI S, ASCE A M. Policy analysis of adaptive traffic signal control using reinforcement learning[J]. Journal of Computing in Civil Engineering, 2019, 34(1): 04019046.
    [10] ZHOU X, FEI Z, QUAN L, et al. A Sarsa(λ)-based control model for real-time traffic light coordination[J]. The Scientific World Journal, 2014, 2014: 759097.
    [11] YIN M, WANG Y, LI Z. Optimization of multi-intersection traffic signal timing model based on improved Q-learning[J]. IOP Conference Series: Materials Science and Engineering, 2020, 768(7): 072100. doi: 10.1088/1757-899X/768/7/072100
    [12] SHABESTARY S, ABDULHAI B. Deep learning vs. discrete reinforcement learning for adaptive traffic signal control[C]//2018 IEEE International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE Press, 2018: 18308952.
    [13] WANG J G, ZHOU L B. Traffic light recognition with high dynamic range imaging and deep learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(4): 1341-1352. doi: 10.1109/TITS.2018.2849505
    [14] SHARMA M, BANSAL A, KASHYAP V, et al. Intelligent traffic light control system based on traffic environment using deep learning[J]. IOP Conference Series: Materials Science and Engineering, 2021, 1022(1): 012122. doi: 10.1088/1757-899X/1022/1/012122
    [15] LI H, KUMAR N, CHEN R, et al. Deep reinforcement learning[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE Press, 2018: 18096630.
    [16] KUMAR N, RAHMAN S S, DHAKAD N. Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(8): 4919-4928. doi: 10.1109/TITS.2020.2984033
    [17] TAN T, BAO F, DENG Y, et al. Cooperative deep reinforcement learning for large-scale traffic grid signal control[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2687-2700. doi: 10.1109/TCYB.2019.2904742
    [18] ZENG J, HU J, ZHANG Y. Adaptive traffic signal control with deep recurrent Q-learning[C]//2018 IEEE Intelligent Vehicles Symposium(IV). Piscataway: IEEE Press, 2018: 18167960.
    [19] LIAO L, LIU J, WU X, et al. Time difference penalized traffic signal timing by LSTM Q-network to balance safety and capacity at intersections[J]. IEEE Access, 2020, 8: 80086-80096. doi: 10.1109/ACCESS.2020.2989151
    [20] LIANG X, DU X, WANG G, et al. Deep reinforcement learning for traffic light control in vehicular networks[EB/OL]. (2018-03-29)[2021-09-01]. https://arxiv.org/abs/1803.11115.
    [21] ZHOU Y, OZBAY K, CHOLETTE M, et al. A mode switching extended Kalman filter for real-time traffic state and parameter estimation[C]//2020 IEEE International Conyerence on Intelligent Transportation Systems (ITSC). Piscataway: IEEE Press, 2020: 20303008.
  • 加载中
图(17) / 表(7)
计量
  • 文章访问数:  119
  • HTML全文浏览量:  18
  • PDF下载量:  41
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-09-06
  • 录用日期:  2021-09-17
  • 刊出日期:  2021-10-12

目录

    /

    返回文章
    返回
    常见问答