一种基于深度学习的交互式电话号码识别方法

韩京冶; 许福; 陈志泊; 刘辉

doi:10.13700/j.bh.1001-5965.2017.0357

一种基于深度学习的交互式电话号码识别方法

doi: 10.13700/j.bh.1001-5965.2017.0357

韩京冶¹,
许福^1, ,,
陈志泊¹,
刘辉²

1.
北京林业大学信息学院, 北京 100083
2.
北京航空航天大学计算机学院, 北京 100083

基金项目:

国家自然科学基金 61772078

北京市重点研发计划 D171100001817003

中央高校基本科研业务费专项资金 YX2014-17

详细信息

作者简介:
韩京冶  男, 硕士研究生。主要研究方向:图像处理、深度学习

许福  男, 博士, 副教授。主要研究方向:图像处理、编译技术、软件工程

陈志泊  男, 博士, 教授, 博士生导师。主要研究方向:数据库技术、林业信息工程

刘辉  男, 博士。主要研究方向:软件工程

通讯作者:
许福, E-mail: xufu@bjfu.edu.cn

中图分类号: TP391.43
计量
- 文章访问数: 725
- HTML全文浏览量: 82
- PDF下载量: 499
- 被引次数: 0
出版历程
- 收稿日期: 2017-05-27
- 录用日期: 2017-06-23
- 网络出版日期: 2018-05-20

A deep learning based interactive recognition method for telephone numbers

HAN Jingye¹,
XU Fu^{1
, ,},
CHEN Zhibo¹,
LIU Hui²

1.
School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
2.
School of Computer Science and Engineering, Beijng University of Aeronautics and Astronautics, Beijing 100083, China

Funds:

National Natural Science Foundation of China 61772078

Key R&D Program of Beijing D171100001817003

the Fundamental Research Funds for the Central Universities YX2014-17

More Information

Corresponding author: XU Fu, E-mail: xufu@bjfu.edu.cn

摘要

摘要:
物流、保险和中介服务等行业需要频繁地拨打电话，而人工拨打电话效率较低，高效的电话号码识别技术具有重要的应用价值。传统的印刷体数字识别方法存在人工设计特征过程复杂、识别字体单一等不足，难以满足实际应用需求。本文提出了一种基于深度学习的交互式的电话号码识别方法，通过鼠标双击图像中的电话号码，自动截取出包含此号码的目标区域，并进行灰度化、二值化、目标区域定位、字符分割和图片补白等预处理操作，在此基础上利用改进的LeNet-5卷积神经网络（CNN）自动学习图像特征，支持多种字体、字形和字号的印刷体数字识别，并利用交互式识别和内存池等方法提高识别速度。实验结果表明，单一字符的识别率为99.86%，整个号码的识别率为99.50%，整个号码平均识别时间为91 ms。本文方法识别精度高、识别速度快，具有较为广泛的应用前景。
- 深度学习 /
- 卷积神经网络(CNN) /
- 电话号码识别 /
- 交互式识别 /
- 目标区域定位
Abstract:
Some sectors such as logistics, insurance and intermediary agents need to make calls frequently. Manually callings lead to low efficiency, so that telephone number recognition has important practical values. The traditional methods for printed number recognition involve complicated templates designing, which cannot meet the requirements of practical applications. An interactive method based on deep learning is proposed to recognize telephone numbers. Through double-clicking the phone number in an image, this method automatically crops the target area which contains the number and performs preprocessing operations such as grayscale, binarization, target area localization, character segmentation and image padding. An improved LeNet-5 convolutional neural network (CNN) is utilized to make image recognition, which supports the recognition of printed numbers in a variety of fonts, glyphs and font sizes. The recognition speed is optimized through multiple means such as interactive recognition and memory pool. Experimental results show that the accuracy of recognition for a single character is 99.86%, the accuracy for a telephone number is 99.50%, and the average recognition time of a telephone number is 91 ms. Comparing with the traditional methods, the new method has relatively higher accuracy and faster speed in recognition, which can be widely used in many sectors.
- deep learning /
- convolutional neural network (CNN) /
- telephone number recognition /
- interactive recognition /
- target area localization

HTML全文

图 1 预处理流程图

Figure 1. Preprocessing flowchart

下载: 全尺寸图片幻灯片

图 2 原始图像

Figure 2. Original image

下载: 全尺寸图片幻灯片

图 3 电话号码目标区域图像

Figure 3. Image of target area containing telephone numbers

下载: 全尺寸图片幻灯片

图 4 灰度化图像

Figure 4. Grayscale image

下载: 全尺寸图片幻灯片

图 5 二值化图像

Figure 5. Binary image

下载: 全尺寸图片幻灯片

图 6 截取后图像

Figure 6. Cropped image

下载: 全尺寸图片幻灯片

图 7 补白前后图像对比

Figure 7. Comparison of image before and after padding

下载: 全尺寸图片幻灯片

图 8 改进的LeNet-5 CNN结构

Figure 8. Improved LeNet-5 CNN structure

下载: 全尺寸图片幻灯片

图 9 实验流程图

Figure 9. Experimental flowchart

下载: 全尺寸图片幻灯片

图 10 数据集生成器

Figure 10. Dataset generator

下载: 全尺寸图片幻灯片

图 11 模型训练的损失和识别精度曲线

Figure 11. Loss and recognition accuracy curves of model training

下载: 全尺寸图片幻灯片

图 12 识别速度测试实例

Figure 12. An example of recognition speed test

下载: 全尺寸图片幻灯片

表 1 宋体数字高度与数字间距离的比值

Table 1. Ratio of number's height to distance in SimSun font

字号	下界(t_down)	上界(t_up)
9号	1.33	2.67
10号	1.13	2.25
11号	1.00	3.33
12号	1.10	3.67
14号	1.30	4.33
16号	1.67	3.50

下载: 导出CSV

表 2 黑体数字高度与数字间距离的比值

Table 2. Ratio of number's height to distance in SimHei font

字号	下界(t_down)	上界(t_up)
9号	1.33	4.00
10号	1.29	4.50
11号	1.25	2.50
12号	1.57	3.67
14号	1.30	2.60
16号	1.36	3.00

下载: 导出CSV

表 3 10种字体的阈值范围

Table 3. Threshold ranges of ten types of fonts

字体	阈值范围
宋体	1.67~2.25
黑体	1.57~2.50
仿宋	1.33~2.25
楷体	1.33~2.25
微软雅黑	2.00~3.00
隶书	1.38~2.50
幼圆	1.10~2.25
Times New Roman	1.71~2.50
Cambria	2.00~3.00
Calibri	1.86~4.00

下载: 导出CSV

表 4 本文方法与3种软件方法识别速度对比

Table 4. Comparison of recognition speed between proposed method and three software methods

ms
识别方法	单个电话号码平均识别时间
本文方法	91
Tesseract-OCR v3.05	225
汉王PDF OCR 8.0	383
ABBYY FineReader 12	433

下载: 导出CSV

参考文献(17)

[1]	罗佳, 王玲.基于凹凸特性笔顺编码的手写体数字识别方法[J].计算机工程与科学, 2010, 29(5):69-70. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjk200705019&dbname=CJFD&dbcode=CJFQ LUO J, WANG L.A new method for the off-line recognition ofhandwritten digits based on convex-concave coding[J].Computer Engineering & Science, 2010, 29(5):69-70(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjk200705019&dbname=CJFD&dbcode=CJFQ
[2]	倪桂博, 梁晓尊.基于结构形状的印刷体数字识别方法[J].软件导刊, 2010, 9(5):67-68. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=rjdk201005027&dbname=CJFD&dbcode=CJFQ NI G B, LIANG X Z.The method of printed figures based on structure[J].Software Guide, 2010, 9(5):67-68(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=rjdk201005027&dbname=CJFD&dbcode=CJFQ
[3]	陈爱斌, 陆丽娜.基于多特征的印刷体数字识别[J].计算技术与自动化, 2011, 30(3):105-108. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjh201103023&dbname=CJFD&dbcode=CJFQ CHEN A B, LU L N.The printed number character recognition based on feature[J].Computing Technology and Automation, 2011, 30(3):105-108(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjh201103023&dbname=CJFD&dbcode=CJFQ
[4]	曾志军, 孙国强.基于改进的BP网络数字字符识别[J].上海理工大学学报, 2008, 30(2):201-204. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=hdgy200802031&dbname=CJFD&dbcode=CJFQ ZENG Z J, SUN G Q.Number character recognition based on improved BP neural network[J].Journal of University of Shanghai for Science and Technology, 2008, 30(2):201-204(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=hdgy200802031&dbname=CJFD&dbcode=CJFQ
[5]	刘春丽, 吕淑静.基于混合特征的孟加拉手写体数字识别[J].计算机工程与应用, 2007, 43(20):214-215. doi: 10.3321/j.issn:1002-8331.2007.20.063 LIU C L, LV S J.Bangla handwritten numeral recognition based on blend features[J].Computer Engineering & Applications, 2007, 43(20):214-215(in Chinese). doi: 10.3321/j.issn:1002-8331.2007.20.063
[6]	HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786):504-507. doi: 10.1126/science.1127647
[7]	SUN Y, WANG X, TANG X. Deep learning face representation from predicting 10, 000 classes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2014: 1891-1898.
[8]	TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: Closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2014: 1701-1708.
[9]	SUN Y, CHEN Y, WANG X, et al. Deep learning face representation by joint identification-verification[C]//International Conference on Neural Information Processing Systems. London: MIT Press, 2014: 1988-1996.
[10]	ZHANG L, LIN L, LIANG X, et al. Is faster R-CNN doing well for pedestrian detection [C]//European Conference on Computer Vision. Berlin: Springer, 2016: 443-457.
[11]	SINGH S P, KUMAR A, DARBARI H, et al. Machine translation using deep learning: An overview[C]//International Conference on Computer, Communications and Electronics. Piscataway, NJ: IEEE Press, 2017: 162-167.
[12]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2016: 770-778.
[13]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. London: MIT Press, 2012: 1097-1105.
[14]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2016: 3213-3223.
[15]	FARABET C, COUPRIE C, NAJMAN L, et al.Learning hierarchical features for scene labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1915-1929. doi: 10.1109/TPAMI.2012.231
[16]	OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1):62-66. doi: 10.1109/TSMC.1979.4310076
[17]	LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324. doi: 10.1109/5.726791