-
摘要:
工业互联网是工业信息化进程中最受关注的热点,海量异构数据管理是其中的重点之一。传统的关系数据库(RDB)对海量多源异构数据的读写和检索都存在性能瓶颈,而近年来兴起的云数据管理方法主要是针对“键-值”(K-V)模式,无法依靠主键以外的数据属性对数据进行快速查找。提出了一种面向工业互联网的云存储方法——StoreCDB,在异构采样数据统一表达数据模型基础上,实现非结构化存储管理,同时,利用两级索引实现海量数据的快速检索。通过实验,在分布式集群实验平台上,采用海量高铁列车运行模拟数据,验证了StoreCDB具有良好的异构数据存储和检索性能,为工业互联网提供了一种新的数据管理方法。
Abstract:With the development of industrial informatization, industrial Internet has attracted many attentions, and massive heterogeneous data management is one of the most important issues. However, traditional relational database (RDB) limits the performance of access and retrieval of massive and heterogeneous data, while cloud data management mainly focuses on key-value (K-V) queries, which cannot quickly search data by using any data property other than the prime key. In this paper, a cloud storage framework-StoreCDB is proposed for data management in the industrial Internet. In StoreCDB, the heterogeneous data are represented by a uniform data model firstly and then stored in a distributed file and parallel architecture as unstructured data. In addition, a double-level index is proposed to support both key-value queries and RDB queries. This paper adopts a distributed cluster experimental platform and massive high-speed train operation simulation data to verify the framework. The experimental results show that StoreCDB has satisfactory heterogeneous data access and retrieval performance and provides a good solution for industrial Internet data management.
-
表 1 传感器采样值的示例
Table 1. Examples of sensor sample values
传感器名称 传感器类型标签 数据记录 s08 运行速度 (s08, 运行速度, 215, t1) s10 行驶速度 (s10, 行驶速度, 216, t2) s401 转向架轴温 (s401, 转向架轴温, 83, t1) s401 转向架轴温 (s401, 转向架轴温, 85, t2) 表 2 属性索引的单维投影范例
Table 2. Example of a one-dimensional projection of comIdx
属性tagName 属性值pValue 数据行号 rowId velocity 135km/h 289 390 136 km/h 190 310 temperature 83℃ 367 390 85℃ 390 463 operation rw 278 290 c 390 463 表 3 属性索引的查询实例
Table 3. Query instance of comIdx
属性tagName 属性值pValue 数据行号 rowId velocity 135km/h 289 390 temperature 83℃ 367 390 operation c 390 463 表 4 实验参数
Table 4. Parameters of experiment
参数 数值 数据源的记录数目NSrc/行 2×1010 采样数据的平均时间间隔frequency/s 5 主节点服务器的数目NmasterNodes 1 从节点服务器的数目NworkerNodes 1~32 表 5 StoreCDB在RDB查询中的加速比
Table 5. The acceleration ratio of StoreCDB in RDB queries
NworkerNodes 加速比 Nsrc=2×1010行 Nsrc=7×106行 2 1.62 1.33 4 1.79 1.49 8 2.06 1.63 16 2.64 2.10 32 4.95 3.32 -
[1] LI D, WU H, LI S C. Internet of things in industries:A survey[J]. IEEE Transactions on Industrial Informatics, 2014, 10(4):2233-2243. doi: 10.1109/TII.2014.2300753 [2] 李晓娟.物联网中海量数据管理技术研究[D].广州: 广东工业大学, 2015. http://kns.cnki.net/kns/detail/detail.aspx?QueryID=1&CurRec=2&recid=&FileName=1015312027.nh&DbName=CMFD201502&DbCode=CMFD&yx=&pr=&URLID=LI X J.Huge amounts of data management technology for internet of things[D]. Guangzhou: Guangdong University of Technology, 2015(in Chinese). http://kns.cnki.net/kns/detail/detail.aspx?QueryID=1&CurRec=2&recid=&FileName=1015312027.nh&DbName=CMFD201502&DbCode=CMFD&yx=&pr=&URLID= [3] 孙鹏.动车组维修物联网及其关键技术研究[D].北京: 中国铁道科学研究院, 2013. http://kns.cnki.net/kns/detail/detail.aspx?QueryID=4&CurRec=2&recid=&FileName=1014150015.nh&DbName=CDFD1214&DbCode=CDFD&yx=&pr=&URLID=SUN P.Study on EMU maintenance in the internet of things and its key technologies[D]. Beijing: China Academy of Railway Sciences, 2013(in Chinese). http://kns.cnki.net/kns/detail/detail.aspx?QueryID=4&CurRec=2&recid=&FileName=1014150015.nh&DbName=CDFD1214&DbCode=CDFD&yx=&pr=&URLID= [4] SANCHEZ L, MUÑOZ L, GALACHE J A, et al.Smartsantander:IoT experimentation over a smart city testbed[J]. Computer Networks, 2014, 61:217-238. doi: 10.1016/j.bjp.2013.12.020 [5] 宁焕生, 徐群玉.全球物联网发展及中国物联网建设若干思考[J].电子学报, 2010, 38(11):2590-2599. http://d.old.wanfangdata.com.cn/Periodical/dianzixb201011023NING H S, XU Q Y.Research on global internet of things' developments and it's lonstruction in China[J]. Acta Electronica Sinica, 2010, 38(11):2590-2599(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/dianzixb201011023 [6] ATZORI L, IERA A, MORABITD G.The internet of things:A survey[J]. Computer Networks, 2010, 54(15):2787-2805. doi: 10.1016/j.comnet.2010.05.010 [7] 康世龙, 杜中一, 雷咏梅, 等.工业物联网研究概述[J].物联网技术, 2013, 3(6):88-90. doi: 10.3969/j.issn.2095-1302.2013.06.035KANG S L, DU Z Y, LEI Y M, et al.Over view of industrial internet of things[J]. Internet of Things Technologies, 2013, 3(6):88-90(in Chinese). doi: 10.3969/j.issn.2095-1302.2013.06.035 [8] GUBBI J, BUYYA R, MARUSIC S, et al.Internet of things (IoT):A vision, architectural elements, and future directions[J]. Future Generation Computer Systems, 2013, 29(7):1645-1660. doi: 10.1016/j.future.2013.01.010 [9] ÖZSU M T, VALDURIEZ P.Principles of distributed database systems[M]. 3rd ed.New York:Springer Science & Business Media, 2011:16. [10] STONEBRAKER M.SQL databases v.NoSQL databases[J]. Communications of the ACM, 2010, 53(4):10-11. doi: 10.1145/1721654 [11] GILBERT S, LYNCH N.Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services[J]. ACM SIGACT News, 2002, 33(2):51-59. doi: 10.1145/564585 [12] HUANG H, WU Q H, YU D.Robust distributed control of robot formations with parameter uncertainty[J]. Journal of Control Theory and Applications, 2011, 9(4):51-58. [13] XU H G, HAN J H, PAN S P, et al.The research on data consistency of distributed storage system based on two-hop DHT[C]//Procceedings of IEEE ICITIS 2011.Piscataway, NJ: IEEE Press, 2011: 131-132. [14] PETER M, GRANCE T.The NIST definition of cloud computing: SP 800-145[R]. Gaithersbury: NIST, 2011. [15] SHEPLER S, CALLAGHAN B, ROBINSON D, et al.Network file system(NFS)version 4 protocol[R]. Reston: The Internet Society, 2003. [16] GHEMAWAT S, HOWARD G, LEUNG S T.The Google file system[J]. ACM SIGOPS Operating Systems Review, 2003, 37(5):29-43. doi: 10.1145/1165389 [17] BORTHAKUR D.HDFS architecture guide[EB/OL]. (2013-10-10)[2018-03-20]. http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html. [18] CHAIKEN R, JENKINS B, LARSON P A, et al.SCOPE:Easy and efficient parallel processing of massive data sets[J]. PVLDB, 2008, 1(2):1265-1276. http://d.old.wanfangdata.com.cn/Periodical/ycygl200305025 [19] BEAVER D, KUMAR S, LI H C, et al.Finding a needle in haystack: Facebook's photo storage[C]//Proceedings of OSDI 2010.Berkeley, CA: USENIX Association, 2010: 47-60. [20] DECANDIA G, HASTORUN D, JAMPANI M, et al.Dynamo: Amazon's highly available key-value store[C]//Proceedings of SOSP 2007.New York: ACM, 2007: 205-220. [21] CHANG F, DEAN J, GHEMAWAT S, et al.Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008, 26(2):1-26. http://d.old.wanfangdata.com.cn/Periodical/jsjgcysj201005061 [22] BAKER J.Megastore: Providing scalable, highly available storage for interactive services[C]//Biennial Conference on Innovative Data Systems Research, 2011: 223-234. [23] CORBETT J C, DEAN J, EPSTEIN M, et al.Spanner: Google's globally-distributed database[C]//Usenix Conference on Operating Systems Design and Implementation, 2012: 251-264. -