中国科学技术大学学报 ›› 2017, Vol. 47 ›› Issue (1): 70-79.DOI: 10.3969/j.issn.0253-2778.2017.01.010

• 原创论文 • 上一篇    下一篇

面向LBSN的k-medoids聚类算法

罗维佳   

  1. 1.西南交通大学信息科学与技术学院, 四川成都 610031;2.成都信息工程大学信息安全工程学院, 四川成都 610225; 3.成都信息工程大学管理学院,四川成都 610103; 4.广西师范学院科学计算与智能信息处理广西高校重点实验室, 广西南宁 530023; 5.成都信息工程大学 软件工程学院, 四川成都 610225
  • 收稿日期:2016-03-01 修回日期:2016-09-17 出版日期:2017-01-31 发布日期:2017-01-31
  • 通讯作者: 乔少杰
  • 作者简介:罗维佳,女,1988年生,硕士生.研究方向:数据挖掘,E-mail: weijialuo1026@gmail.com
  • 基金资助:
    国家自然科学基金(61100045, 61165013, 61363037),教育部人文社会科学研究规划基金(15YJAZH058),教育部人文社会科学研究青年基金(14YJCZH046),四川省教育厅资助科研项目(14ZB0458),成都市软科学项目(2015-RK00-00059-ZF),科学计算与智能信息处理广西高校重点实验室开放课题(GXSCIIP201407)资助.

A k-medoids based clustering algorithm in location based social networks

LUO Weijia   

  1. 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China; 2.College of Information Security Engineering, Chengdu University of Information Technology, Chengdu 610225, China; 3.School of Management, Chendu University of Information Technology, Chendu 610103, China; 4. Science Computing and Intelligent Information Processing of GuangXi higher education Key Laboratory, Guangxi Teachers Education University, Nanning 530023, China; 5. School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
  • Received:2016-03-01 Revised:2016-09-17 Online:2017-01-31 Published:2017-01-31

摘要: 常用的聚类算法存在诸多不足,为此提出了一种基于初始半径r的k-medoids改进算法,主要针对LBSN中的位置数据进行聚类,改善初始聚类中心敏感对聚类结果的影响,其本质是基于密度聚类,不同之处在于k值的选取是依赖于半径r.通过大量真实签到数据集进行实验,结果显示本文算法聚类结果更稳定.本文算法在基于位置的社交网络应用中获得更好的聚类效果和更快的收敛速度.实验中将距离平方和作为准则函数进行对比,相对于传统k-medoids算法优势明显,对退化的k-medoids算法也能够缩小1.2%到2%.

关键词: 社交网络, 密度聚类, k-medoids, 签到数据, 距离相似度

Abstract: The commonly-used clustering algorithms have several drawbacks. Aiming to solve the above problems, an improved k-medoids algorithm was proposed based on the initial radius r, which is used for clustering using location data. The algorithm is actually a density-based clustering approach. The difference is that the k value depends on the radius r. Extensive experiments are conducted on real check-in data, and the results show that the improved k-mediods algorithm on the radius r is more stable. In addition, by comparing the sum of the square of distance between objects in the same cluster among different algorithms, the proposed algorithm can obtain better clustering results and convergence speed when applied to location based social networks. Compared to the traditional k-medoids algorithm, the cost has obviously reduced, as for and the degraded k-medoids algorithm, the cost can be reduced among 1.2% and 2%.

Key words: social networks, density-based clustering, k-medoids, check-in data, distance similarity