北京航空航天大学学报 ›› 2020, Vol. 46 ›› Issue (3): 548-562.doi: 10.13700/j.bh.1001-5965.2019.0003

• 论文 • 上一篇    下一篇

分布式用户痕迹采集存储系统

夏乾臣, 吕江花, 孟祥曦, 马世龙   

  1. 北京航空航天大学 计算机学院, 北京 100083
  • 收稿日期:2019-01-09 发布日期:2020-03-28
  • 通讯作者: 吕江花 E-mail:jhlv@buaa.edu.cn
  • 作者简介:夏乾臣,女,博士研究生。主要研究方向:大数据管理与应用、机器学习、软件工程;吕江花,女,博士,讲师。主要研究方向:形式化方法、自动化测试、软件可信性分析;马世龙,男,博士,教授。主要研究方向:软件测试、形式化方法与软件工程、大数据管理与应用。
  • 基金资助:
    国家自然科学基金(61300007,61305054);软件开发环境国家重点实验室自主探索基金(SKLSDE-2012ZX-28,SKLSDE-2014ZX-06)

Distributed user trace collection and storage system

XIA Qianchen, LYU Jianghua, MENG Xiangxi, MA Shilong   

  1. School of Computer Science and Engineering, Beihang University, Beijing 100083, China
  • Received:2019-01-09 Published:2020-03-28
  • Supported by:
    National Natural Science Foundation of China (61300007,61305054); Foundation of the Key Lab of Software Development Environment (SKLSDE-2012ZX-28,SKLSDE-2014ZX-06)

摘要: 在复杂网络的分布式环境中,精准全面地采集海量用户在浏览网站过程中的行为数据和网站过程数据并高效存储是用户行为分析的前提和基础。为了解决数据类型的多样性和存储的差异性问题,提高数据的检索效率,为企业的个性化需求做用户行为的分析提供支持,设计了白盒模式的用户痕迹采集存储系统。用户访问Web服务器过程中会产生交互/交易数据以及用户操作,浏览网站过程中会产生图片、视频、商品描述等多种类型的文件,这些界面和数据称为用户浏览痕迹,操作序列则作为用户行为的实际动作顺序记录。对用户数据和操作序列分析,能精确反映用户特征。采集模型通过界面窗口树来建模,提供统一数据存取接口,根据数据类型的不同,分别存储于不同的位置,完整采集用户痕迹,应用程序传递参数指定存储位置创建数据库文件,通过存取接口可以分类型、按要求存取用户数据,解决了面向互联网的用户交互痕迹捕获、存储和检索的问题,具有良好的精确性和完整性。

关键词: 用户行为, 用户痕迹采集, 界面窗口树, 统一存储, 非结构化数据

Abstract: In the distributed complex network environment, collecting the large number of users' behavioral data along with the website data during browsing accurately and comprehensively, efficiently storing them are the basis of user behavior analysis. In order to solve the problems of diversity of data types and storage differences, improve the efficiency of data retrieval, and provide support for the analysis of user behavior for the individual needs of enterprises, a white box mode of user trace collection and storage system is designed in this paper. The users visit the Web server and processes the data of interaction/transaction and user operations, such as pictures, video, description of goods and other types of files. These interfaces and data are called user browsing traces, and operation sequences are the actual user behaviors in order. User data and operation sequence analysis can accurately reflect user characteristics. The collection system is modeled by the interface window tree, providing a unified access interface for data, which is stored in different locations according to the data types. The applications input parameters to specify the storage location to create the database. Through the access interface, the user data can be accessed according to the different file types and requirements. The model solves the problem of capturing, storing, and retrieving traces of Internet-oriented user interaction, and has good accuracy and integrity.

Key words: user behavior, user trace collection, interface window tree, unique storage, unstructured data

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发