文章快速检索 高级检索
   首页  期刊介绍  编委会  投稿指南  期刊订阅  下载中心  留 言 板  联系我们
北京航空航天大学学报 2010, Vol. 36 Issue (4) :500-503    DOI:
论文 最新目录 | 下期目录 | 过刊浏览 | 高级检索 << | >>
基于用户查询日志的查询聚类
贾荣飞, 金茂忠, 王晓博*
北京航空航天大学 计算机学院, 北京 100191
Query clustering using user-query logs
Jia Rongfei, Jin Maozhong, Wang Xiaobo*
School of Computer Science and Technology, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

摘要
参考文献
相关文章
Download: PDF (320KB)   HTML 1KB   Export: BibTeX or EndNote (RIS)      Supporting Info
摘要 基于用户查询日志提出了新的查询聚类算法.用户查询日志数据量大,比通常用于查询聚类的查询展现日志和查询点击日志更加稠密,不易产生聚类小的问题,但噪声多,不容易处理.为发现相似查询并减少噪声影响,同一用户同一时段的多次查询(共现查询)之间认为具有较高相似概率.在这一假设基础上,利用查询共现关系建立查询的邻居查询向量空间.将查询用邻居查询向量表示,邻居查询向量的相似度作为聚类中的查询相似度.应用改进的基于密度聚类算法完成聚类.实验证明,95262个查询组成数据集上,聚类算法实现查准率79.77%、查全率48.21%,平均聚类大小达到51.
Service
把本文推荐给朋友
加入我的书架
加入引用管理器
Email Alert
RSS
作者相关文章
贾荣飞
金茂忠
王晓博
关键词聚类算法   搜索引擎   日志挖掘     
Abstract: A new query clustering method on user-query log was presented. Traditional clustering techniques focused on queries and click-through logs, which are often sparse. The average cluster size is often small. In contrast, the user-query log is much denser as well as noisier. To reduce the influence of the noises and discover similar queries, queries visited by the same user at the same session were assumed to be mostly similar. Based on the assumption, a new similarity measure using query co-occurrence relations was calculated to create query neighbor vector space. The queries were represented by vectors consisting of their neighbors. The similarity function for clustering was calculated based on the query neighbor vectors. An adjusted clustering method of density-based spatial clustering of applications with noise(DBSCAN) was applied to generate the clusters. Experiments on a real dataset of 95262 queries show that 79.77% precision and 48.21% recall is achieved and the average cluster size achieves 51.
Keywordsclustering algorithms   search engines   data mining     
Received 2009-07-10;
Fund:

国家863计划资助项目(2007AA010302); 国家自然科学基金资助项目(60603039,90718018)

About author: 贾荣飞(1981-),男, 辽宁沈阳人,博士生,cjrf@sei.buaa.edu.cn.
引用本文:   
贾荣飞, 金茂忠, 王晓博.基于用户查询日志的查询聚类[J]  北京航空航天大学学报, 2010,V36(4): 500-503
Jia Rongfei, Jin Maozhong, Wang Xiaobo.Query clustering using user-query logs[J]  JOURNAL OF BEIJING UNIVERSITY OF AERONAUTICS AND A, 2010,V36(4): 500-503
链接本文:  
http://bhxb.buaa.edu.cn//CN/     或     http://bhxb.buaa.edu.cn//CN/Y2010/V36/I4/500
Copyright 2010 by 北京航空航天大学学报