-
摘要:
物流、保险和中介服务等行业需要频繁地拨打电话,而人工拨打电话效率较低,高效的电话号码识别技术具有重要的应用价值。传统的印刷体数字识别方法存在人工设计特征过程复杂、识别字体单一等不足,难以满足实际应用需求。本文提出了一种基于深度学习的交互式的电话号码识别方法,通过鼠标双击图像中的电话号码,自动截取出包含此号码的目标区域,并进行灰度化、二值化、目标区域定位、字符分割和图片补白等预处理操作,在此基础上利用改进的LeNet-5卷积神经网络(CNN)自动学习图像特征,支持多种字体、字形和字号的印刷体数字识别,并利用交互式识别和内存池等方法提高识别速度。实验结果表明,单一字符的识别率为99.86%,整个号码的识别率为99.50%,整个号码平均识别时间为91 ms。本文方法识别精度高、识别速度快,具有较为广泛的应用前景。
-
关键词:
- 深度学习 /
- 卷积神经网络(CNN) /
- 电话号码识别 /
- 交互式识别 /
- 目标区域定位
Abstract:Some sectors such as logistics, insurance and intermediary agents need to make calls frequently. Manually callings lead to low efficiency, so that telephone number recognition has important practical values. The traditional methods for printed number recognition involve complicated templates designing, which cannot meet the requirements of practical applications. An interactive method based on deep learning is proposed to recognize telephone numbers. Through double-clicking the phone number in an image, this method automatically crops the target area which contains the number and performs preprocessing operations such as grayscale, binarization, target area localization, character segmentation and image padding. An improved LeNet-5 convolutional neural network (CNN) is utilized to make image recognition, which supports the recognition of printed numbers in a variety of fonts, glyphs and font sizes. The recognition speed is optimized through multiple means such as interactive recognition and memory pool. Experimental results show that the accuracy of recognition for a single character is 99.86%, the accuracy for a telephone number is 99.50%, and the average recognition time of a telephone number is 91 ms. Comparing with the traditional methods, the new method has relatively higher accuracy and faster speed in recognition, which can be widely used in many sectors.
-
表 1 宋体数字高度与数字间距离的比值
Table 1. Ratio of number's height to distance in SimSun font
字号 下界(tdown) 上界(tup) 9号 1.33 2.67 10号 1.13 2.25 11号 1.00 3.33 12号 1.10 3.67 14号 1.30 4.33 16号 1.67 3.50 表 2 黑体数字高度与数字间距离的比值
Table 2. Ratio of number's height to distance in SimHei font
字号 下界(tdown) 上界(tup) 9号 1.33 4.00 10号 1.29 4.50 11号 1.25 2.50 12号 1.57 3.67 14号 1.30 2.60 16号 1.36 3.00 表 3 10种字体的阈值范围
Table 3. Threshold ranges of ten types of fonts
字体 阈值范围 宋体 1.67~2.25 黑体 1.57~2.50 仿宋 1.33~2.25 楷体 1.33~2.25 微软雅黑 2.00~3.00 隶书 1.38~2.50 幼圆 1.10~2.25 Times New Roman 1.71~2.50 Cambria 2.00~3.00 Calibri 1.86~4.00 表 4 本文方法与3种软件方法识别速度对比
Table 4. Comparison of recognition speed between proposed method and three software methods
ms 识别方法 单个电话号码平均识别时间 本文方法 91 Tesseract-OCR v3.05 225 汉王PDF OCR 8.0 383 ABBYY FineReader 12 433 -
[1] 罗佳, 王玲.基于凹凸特性笔顺编码的手写体数字识别方法[J].计算机工程与科学, 2010, 29(5):69-70. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjk200705019&dbname=CJFD&dbcode=CJFQLUO J, WANG L.A new method for the off-line recognition ofhandwritten digits based on convex-concave coding[J].Computer Engineering & Science, 2010, 29(5):69-70(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjk200705019&dbname=CJFD&dbcode=CJFQ [2] 倪桂博, 梁晓尊.基于结构形状的印刷体数字识别方法[J].软件导刊, 2010, 9(5):67-68. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=rjdk201005027&dbname=CJFD&dbcode=CJFQNI G B, LIANG X Z.The method of printed figures based on structure[J].Software Guide, 2010, 9(5):67-68(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=rjdk201005027&dbname=CJFD&dbcode=CJFQ [3] 陈爱斌, 陆丽娜.基于多特征的印刷体数字识别[J].计算技术与自动化, 2011, 30(3):105-108. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjh201103023&dbname=CJFD&dbcode=CJFQCHEN A B, LU L N.The printed number character recognition based on feature[J].Computing Technology and Automation, 2011, 30(3):105-108(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=jsjh201103023&dbname=CJFD&dbcode=CJFQ [4] 曾志军, 孙国强.基于改进的BP网络数字字符识别[J].上海理工大学学报, 2008, 30(2):201-204. http://kns.cnki.net/KCMS/detail/detail.aspx?filename=hdgy200802031&dbname=CJFD&dbcode=CJFQZENG Z J, SUN G Q.Number character recognition based on improved BP neural network[J].Journal of University of Shanghai for Science and Technology, 2008, 30(2):201-204(in Chinese). http://kns.cnki.net/KCMS/detail/detail.aspx?filename=hdgy200802031&dbname=CJFD&dbcode=CJFQ [5] 刘春丽, 吕淑静.基于混合特征的孟加拉手写体数字识别[J].计算机工程与应用, 2007, 43(20):214-215. doi: 10.3321/j.issn:1002-8331.2007.20.063LIU C L, LV S J.Bangla handwritten numeral recognition based on blend features[J].Computer Engineering & Applications, 2007, 43(20):214-215(in Chinese). doi: 10.3321/j.issn:1002-8331.2007.20.063 [6] HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786):504-507. doi: 10.1126/science.1127647 [7] SUN Y, WANG X, TANG X. Deep learning face representation from predicting 10, 000 classes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2014: 1891-1898. [8] TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: Closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2014: 1701-1708. [9] SUN Y, CHEN Y, WANG X, et al. Deep learning face representation by joint identification-verification[C]//International Conference on Neural Information Processing Systems. London: MIT Press, 2014: 1988-1996. [10] ZHANG L, LIN L, LIANG X, et al. Is faster R-CNN doing well for pedestrian detection [C]//European Conference on Computer Vision. Berlin: Springer, 2016: 443-457. [11] SINGH S P, KUMAR A, DARBARI H, et al. Machine translation using deep learning: An overview[C]//International Conference on Computer, Communications and Electronics. Piscataway, NJ: IEEE Press, 2017: 162-167. [12] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2016: 770-778. [13] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. London: MIT Press, 2012: 1097-1105. [14] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2016: 3213-3223. [15] FARABET C, COUPRIE C, NAJMAN L, et al.Learning hierarchical features for scene labeling[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1915-1929. doi: 10.1109/TPAMI.2012.231 [16] OTSU N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1):62-66. doi: 10.1109/TSMC.1979.4310076 [17] LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324. doi: 10.1109/5.726791