Heterogeneous data sharing technology based on two-layer metadata and ontology
-
摘要: 针对多源、多类、异构数据难以同时共享的问题,提出了一种两层元数据结合本体的信息共享技术.首先,分析了两层元数据的结构,介绍了如何通过两层元数据统一描述多类异构数据.其次,针对元数据缺乏语义信息不能描述数据类别之间的隐含关系的问题,在元数据之上建立本体层,对元数据进行语义描述和本体推理.最后,在数据检索方面,利用Lucene全文检索引擎与SPARQL(Simple Protocol and RDF Query Language)本体查询语言相结合,在关键词查询过程增加了SPARQL检索操作,提高了查全率,并优化了检索时间.实验选取了2014-2015赛季欧洲足球冠军联赛数据作为测试数据,证明了本文方法在异构数据共享上的有效性和元数据查询性能的改进.Abstract: With the aim to share multi-sourced, multi-class, heterogeneous data simultaneously, an information sharing technology was proposed based on a two-layer metadata combined with ontology. Firstly, the structure of the two-layer metadata standard was analyzed. At the same time, how to implement uniform description for heterogeneous data was introduced. Secondly, due to the lack of semantic information, some important potential correlations between metadata classes may be ignored. For this reason ontology was established on the metadata layer for describing and reasoning the relationships between classes. Finally, in order to improve the recall rate and optimize the retrieval time, an improved method combining Lucene full-text search engine with SPARQL query was proposed to retrieve metadata. SPARQL retrieval was performed before the keyword queried by Lucene. Soccer match information of 2014-2015 UEFA Champions League was selected as test data. The experiment results illustrate the effectiveness on sharing heterogeneous data and improvement on recall and timeliness of the approach.
-
Key words:
- heterogeneous data /
- metadata /
- ontology /
- information sharing /
- semantic retrieval
-
[1] Guo X M, Ma L L,Su K,et al.Research and design of multi-source heterogeneous information integration platform for metadata service[J].Applied Mechanics and Materials,2014,513-517:1485-1489. [2] Li X T, Hu X H,Liu X,et al.Research on metadata-based multiclass information sharing technology[C]//2014 IEEE Workshop on Electronics,Computer and Applications.Piscataway,NJ:IEEE Press,2014:404-407. [3] 杜小勇,李曼, 王珊.本体学习研究综述[J].软件学报,2006,17(9):1837-1847. Du X Y,Li M,Wang S.A survey on ontology learning research[J].Chinese Journal of Software,2006,17(9):1837-1847(in Chinese). [4] Rajpathak D, Chougule R.A generic ontology development framework for data integration and decision support in a distributed environment[J].International Journal of Computer Integrated Manufacturing,2011,24(2):154-170. [5] Stasinopoulou T, Bountouri L,Kakali C,et al.Ontology-based metadata integration in the cultural heritage domain[C]//Proceedings of 10th International Conference on Asian Digital Libraries.Heidelberg:Springer Verlag,2007:165-175. [6] Kakali C, Lourdi I,Stasinopoulou T,et al.Integrating Dublin core metadata for cultural heritage collections using ontologies[C]//Proceedings of International Conference on Dublin Core and Metadata Applications.Singapore,Dublin:Dublin Core Metadata Initiative,2007:128-139. [7] Arch-int N, Arch-int S.Semantic ontology mapping for interoperability of learning resource systems using a rule-based reasoning approach[J].Expert Systems with Applications,2013,40(18):7428-7443. [8] Zuo Z H, Zhou M T.Web ontology language OWL and its description logic foundation[C]//Proceedings of International Conference on Parallel and Distributed Computing,Applications and Technologies.Piscataway,NJ:IEEE Press,2003:157-160. [9] 董慧. 本体与数字图书馆[M].武汉:武汉大学出版社,2008:222-223. Dong H.Ontology and digital Library[M].Wuhan:Wuhan University Press,2008:222-223(in Chinese). [10] Qian L P, Wang L D.An evaluation of Lucene for keywords search in large-scale short text storage[C]//Proceedings of 2010 International Conference on Computer Design and Applications (ICCDA).Piscataway,NJ:IEEE Press,2010:206-209. [11] Li S D, Lv X Q,Ling F,et al.Study on efficiency of full-text retrieval based on Lucene[C]//Proceedings of Information Engineering and Computer Science.Piscataway,NJ:IEEE Press,2009:1-4. [12] Manuel S, Horridge M,Paul R.Using SPARQL to query bioportal ontologies and metadata[J].Lecture Notes in Computer Science,2012,7650(2):180-195. [13] 李文雄,闫茂德, 王建伟.智能交通系统本体数据集成[J].中南大学学报:自然科学版,2013,44(7):3038-3097. Li W X,Yan M D,Wang J W.Ontology-based data integration for intelligent transport systems[J].Journal of Central South University:Science and Technology,2013,44(7):3038-3097(in Chinese). [14] 毛新生. SOA原理·方法·实践[M].北京:电子工业出版,2007:3-4. Mao X S.SOA principles methods practice[M].Beijing:Publishing House of Electronics Industry,2007:3-4(in Chinese). [15] Cleverdon C. On the inverse relationship and precision[J].Journal of Documentation,1972,28(3):195-202. [16] 杜方,陈跃国, 杜小勇.RDF数据查询处理技术综述[J].软件学报,2013,24(6):1222-1242. Du F,Chen Y G,Du X Y.Survey of RDF query processing techniques[J].Journal of Software,2013,24(6):1222-1242(in Chinese). [17] 白培发,王成良, 徐玲.一种融合词语位置特征的Lucene相似度评分算法[J].计算机工程与应用,2014,50(2):129-132. Bai P F,Wang C L,Xu L.Scoring algorithm of similarity based on terms' position feature combination for Lucene[J].Computer Engineering and Applications,2014,50(2):129-132(in Chinese). [18] 黄承慧,印鉴, 陆寄远.一种改进的Lucene语义相似度检索算法[J].中山大学学报:自然科学版,2011,50(2):11-15. Huang C H,Yin J,Lu J Y.An improved retrieve algorithm incorporated semantic similarity for Lucene[J].Acta Scientiarum Naturalium Universitatis Sunyatseni:Science and Technology, 2011,50(2):11-15(in Chinese). [19] 吴代文,杨方琦. Lucene在数据库全文检索中的性能研究[J].微计算机应用,2011,32(6):53-61. Wu D W,Yang F Q.The Performance study of database full-text retrieval based on Lucene[J].Microcomputer Applications,2011,32(6):53-61(in Chinese). [20] Kara S, Alan O,Sabuncu O,et al.An ontology-based retrieval system using semantic indexing[J].Information Systems,2012,37(4):294-305. [21] 王富强,王青山, 张立朝,等.基于Lucene的是数据库全文信息检索[J].测绘科学,2008,33(3):184-187. Wang F Q,Wang Q S,Zhang L C,et al.Database full-text search based on Lucene[J].Science of Surveying and Mapping,2008,33(3):184-187(in Chinese).
点击查看大图
计量
- 文章访问数: 915
- HTML全文浏览量: 165
- PDF下载量: 550
- 被引次数: 0