针对现有机器人抓取系统对硬件设备要求高、难以适应不同物体及抓取过程产生较大有害扭矩等问题,提出一种基于深度学习的视觉检测及抓取方法。采用通道注意力机制对YOLO-V3进行改进,增强网络对图像特征提取的能力,提升复杂环境中目标检测的效果,平均识别率较改进前增加0.32%。针对目前姿态估计角度存在离散性的问题,提出一种基于视觉几何组-16(VGG-16)主干网络嵌入最小面积外接矩形(MABR)算法,进行抓取位姿估计和角度优化。改进后的抓取角度与目标实际角度平均误差小于2.47°,大大降低两指机械手在抓取过程中对物体所额外施加的有害扭矩。利用UR5机械臂、气动两指机械手、Realsense D435相机及ATI-Mini45六维力传感器等设备搭建了一套视觉抓取系统,实验表明:所提方法可以有效地对不同物体进行抓取分类操作、对硬件的要求较低、并且将有害扭矩降低约75%,从而减小对物体的损害,具有很好的应用前景。
Abstract:This paper proposes a deep learning based visual detection and grasping method to solve the problems of the existing robotic grasping systems, including high hardware costs, difficulty in adapting to different objects, and large harmful torques. The channel attention mechanism is used to enhance the ability of the network to extract image features, improving the effect of target detection in complex environments using the improved YOLO-V3. It is found that the average recognition rate is increased by 0.32% compared with that before the improvement. In addition, to address the discreteness of estimated orientation angles, an embedded minimum area bounding rectangle (MABR) algorithm based on VGG-16 backbone network is proposed to estimate and optimize the grasping position and orientation. The average error between the improved predicted grasping angle and the actual angle of the target is less than 2.47°, significantly reducing the additional harmful torque applied by the two-finger gripper to the object in the grasping process. This study then builds a visual grasping system, using a UR5 robotic arm, a pneumatic two-finger robotic gripper, a Realsense D435 camera, and an ATI-Mini45 six-axis force/torque sensor. Experimental results show that the proposed method can effectively grasp and classify objects, with low requirements for hardware. It reduces the harmful torque by about 75%, thereby reducing damage to grasped objects, and showing a great application prospect.
Key words:
- deep learning /
- neural network /
- object detection /
- pose estimation /
- robotic grasping
表 1 不同网络结构对比
Table 1. Comparison of different network structure
网络结构 准确率/% 运行时间/s cornell数据集 实验目标 双层网络结构ResNet-50 91.30 87.11 0.932 单层网络结构ResNet-50 91.12 86.69 0.714 单层网络结构
VGG-1690.89 87.19 0.286 表 2 位姿估计结果
Table 2. Pose estimation results
目标 目标抓取点(u, v)/像素 目标抓取角度/(°) 目标实际角度/(°) 改进前 改进后 改进前 改进后 control board (107, 112.2) (107, 112.2) 100 124 123 hammer (92.3, 109.3) (92.3, 109.3) 30 18 23 shovel (111.3, 108.5) (111.3, 108.5) 50 62 59 wrench (87.5, 132.5) (87.5, 132.5) 40 46 44 scissors (104.5, 118.3) (104.5, 118.3) 50 53 54 pliers (88.1, 114.5) (88.1, 114.5) 40 48 52 umbrella (88.5, 98.4) (88.5, 98.4) 30 35 35 weight counter (100.5, 136.2) (100.5, 136.2) 90 135 127 stapler (106.9, 104.7) (106.9, 104.7) 30 46 48 solid glue (98.2, 120.9) (98.2, 120.9) 40 45 45 screwdriver (83.6, 110.2) (83.6, 110.2) 130 161 162 sponge (84.6, 118.7) (84.6, 118.7) 180 13 14 表 3 抓取实验数据
Table 3. Experimental data of grasping
编号 目标抓取点(x, y, z)/mm 目标抓取角度/(°) 目标实际角度/(°) 抓取扭矩/(N·mm) 改进前 改进后 改进前 改进后 改进前 改进后 实验1 (153.41, −675.29, 102.35) (153.41, −675.29, 102.35) 50 52 53 4.0 2.3 实验2 (13122, −603.70, 99.16) (13122, −603.70, 99.16) 50 58 58 9.5 0.3 实验3 (161.96, −558.44, 102.71) (161.96, −558.44, 102.71) 140 157 156 15.0 2.6 实验4 (111.15, −574.50, 98.79) (111.15, −574.50, 98.79) 10 21 19 8.0 5.0 实验5 (114.63, −732.19, 96.96) (114.63, −732.19, 96.96) 30 39 47 19.0 11.0 实验6 (102.68, −657.68, 100.41) (102.68, −657.68, 100.41) 40 51 50 10.6 2.5 实验7 (127.41, −675.63, 100.53) (127.41, −675.63, 100.53) 40 46 45 4.0 1.5 实验8 (155.39, −597.67, 105.50) (155.39, −597.67, 105.50) 50 53 55 8.0 3.8 实验9 (176.65, −690.90, 103.57) (176.65, −690.90, 103.57) 90 111 113 17.5 4.0 实验10 (131.77, −739.27, 100.34) (131.77, −739.27, 100.34) 100 112 112 12.5 0 实验11 (194.20, −687.68, 101.49) (194.20, −687.68, 101.49) 30 36 35 5.0 2.5 实验12 (127.47, −590.49, 100.38) (127.47, −590.49, 100.38) 30 63 63 25.0 0 -
