中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (1): 47-56.DOI: 10.3969/j.issn.0253-2778.2018.01.007

• 论著 • 上一篇    下一篇

一种基于轨迹数据密度分区的分布式并行聚类方法

王佳玉,张振宇,褚征,吴晓红   

  1. 1.新疆大学软件学院,乌鲁木齐 830008;2.新疆大学信息科学与工程学院,乌鲁木齐 830046
  • 收稿日期:2017-05-20 修回日期:2017-06-23 出版日期:2018-01-01 发布日期:2018-01-01
  • 通讯作者: 张振宇
  • 作者简介:王佳玉,女,1991年生,硕士生,研究方向:移动计算,E-mail:jennywang91@126.com
  • 基金资助:
    国家自然科学基金项目(61262089)资助.

A trajectory data density partition based distributed parallel clustering method

WANG Jiayu, ZHANG Zhenyu, CHU Zheng, WU Xiaohong   

  1. 1. School of Software, Xinjiang University, Urumqi 830008;
    2. College of Information Science and Engineering, Xinjiang University, Urumqi 830046)
  • Received:2017-05-20 Revised:2017-06-23 Online:2018-01-01 Published:2018-01-01

摘要: 全球定位技术与基于位置服务的发展促进了轨迹大数据的发展.轨迹聚类作为最重要的轨迹分析任务之一,得到了广泛的研究.目前,大多数聚类方法是在单处理机模式下运行,对于大规模的轨迹数据其处理时间较长,难以满足时效性强的轨迹分析任务,为此提出一种基于轨迹数据密度分区的分布式并行聚类方法.首先将整个轨迹数据集抽象在一个矩形区域内,通过该矩形最长维度的变换将数据合理地划分为若干任务量相当的分区,构建可供分布式并行聚类的局部数据集,然后各工作服务器对局部分区分别执行DBSCAN聚类算法,管理服务器对局部聚类结果进行合并与整合.实验结果验证了本方法的有效性,在一定程度上提高了聚类分析的运算效率.

关键词: 轨迹大数据, 分布式聚类, DBSCAN算法, 聚类算法

Abstract: The development of global positioning technology and location-based service have contributed to the development of trajectory big data. Trajectory clustering is one of the most important trajectory analysis tasks and has been extensively studied. Currently, most of the clustering methods operate in a single-processor mode, and large-scale trajectory data processing is a lengthy process, making it difficult to meet the strong timeliness of the trajectory analysis task. To solve the problem, a distributed parallel clustering method based on trajectory density partition is proposed. Firstly, the whole dataset is abstracted in a rectangular region, and the dataset is divided into several partitions with tasks that have almost the same amount by the transformation of the longest dimension of the rectangle, thus constructing the local datasets for distributed parallel clustering. Then the worker servers implement the DBSCAN clustering algorithm for the local partitions respectively, and the manager server merges and integrates the local clustering results. The experimental results show that the algorithm is effective and improves the computational rate of clustering analysis to a certain degree.

Key words: trajectory big data, distributed clustering, DBSCAN algorithm, clustering algorithm

中图分类号: