一种基于轨迹数据密度分区的分布式并行聚类方法

doi:10.3969/j.issn.0253-2778.2018.01.007

中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (1): 47-56.DOI: 10.3969/j.issn.0253-2778.2018.01.007

一种基于轨迹数据密度分区的分布式并行聚类方法

王佳玉，张振宇，褚征，吴晓红

1．新疆大学软件学院，乌鲁木齐 830008;2．新疆大学信息科学与工程学院，乌鲁木齐 830046

收稿日期:2017-05-20 修回日期:2017-06-23 出版日期:2018-01-01 发布日期:2018-01-01
通讯作者: 张振宇
作者简介:王佳玉，女，1991年生，硕士生，研究方向：移动计算，E-mail：jennywang91@126.com
基金资助:
国家自然科学基金项目（61262089）资助.

A trajectory data density partition based distributed parallel clustering method

WANG Jiayu, ZHANG Zhenyu, CHU Zheng, WU Xiaohong

1． School of Software, Xinjiang University, Urumqi 830008;
2． College of Information Science and Engineering, Xinjiang University, Urumqi 830046）

Received:2017-05-20 Revised:2017-06-23 Online:2018-01-01 Published:2018-01-01

摘要/Abstract

摘要： 全球定位技术与基于位置服务的发展促进了轨迹大数据的发展．轨迹聚类作为最重要的轨迹分析任务之一，得到了广泛的研究．目前，大多数聚类方法是在单处理机模式下运行，对于大规模的轨迹数据其处理时间较长，难以满足时效性强的轨迹分析任务,为此提出一种基于轨迹数据密度分区的分布式并行聚类方法.首先将整个轨迹数据集抽象在一个矩形区域内，通过该矩形最长维度的变换将数据合理地划分为若干任务量相当的分区，构建可供分布式并行聚类的局部数据集，然后各工作服务器对局部分区分别执行DBSCAN聚类算法，管理服务器对局部聚类结果进行合并与整合．实验结果验证了本方法的有效性，在一定程度上提高了聚类分析的运算效率．

关键词: 轨迹大数据, 分布式聚类, DBSCAN算法, 聚类算法

Abstract: The development of global positioning technology and location-based service have contributed to the development of trajectory big data. Trajectory clustering is one of the most important trajectory analysis tasks and has been extensively studied. Currently, most of the clustering methods operate in a single-processor mode, and large-scale trajectory data processing is a lengthy process, making it difficult to meet the strong timeliness of the trajectory analysis task. To solve the problem, a distributed parallel clustering method based on trajectory density partition is proposed. Firstly, the whole dataset is abstracted in a rectangular region, and the dataset is divided into several partitions with tasks that have almost the same amount by the transformation of the longest dimension of the rectangle, thus constructing the local datasets for distributed parallel clustering. Then the worker servers implement the DBSCAN clustering algorithm for the local partitions respectively, and the manager server merges and integrates the local clustering results. The experimental results show that the algorithm is effective and improves the computational rate of clustering analysis to a certain degree.

Key words: trajectory big data, distributed clustering, DBSCAN algorithm, clustering algorithm

中图分类号:

TP391

王佳玉，张振宇，褚征，吴晓红. 一种基于轨迹数据密度分区的分布式并行聚类方法[J]. 中国科学技术大学学报, 2018, 48(1): 47-56.

WANG Jiayu, ZHANG Zhenyu, CHU Zheng, WU Xiaohong. A trajectory data density partition based distributed parallel clustering method[J]. Journal of University of Science and Technology of China, 2018, 48(1): 47-56.

参考文献

［1］
FANG Z X, SHAW S L, TU W, et al. Spatiotemporal analysis of critical transportation links based on time geographic concepts: a case study of critical bridges in Wuhan, China[J]. Journal of Transport Geography, 2012, 23(3): 44-59.
[2] LI Q N, ZHENG Y, XIE X, et al. Mining user similarity based on location history[C]// Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems. Irvine, USA: ACM Press, 2008: No. 34.
[3] ZHENG Y, ZHANG L Z, MA Z M, et al. Recommending friends and locations based on individual location history[J]. ACM Transactions on the Web, 2011, 5(1): 5(1-44).
[4] REHM F. Clustering of Flight Tracks[M]//AIAA Infotech@ Aerospace 2010. 2010: 3412.
[5] 赵恩来, 郝文宁, 赵飞, 等. 改进的基于密度的航迹聚类算法[J]. 计算机工程, 2011, 37(9): 270-272.
ZHAO Enlai, HAO Wenning, ZHAO Fei, et al. Improved track clustering algorithm based on density[J]. Computer Engineering, 2011, 37(9): 270-272.
[6] YUAN G, XIA S, ZHANG L, et al. An efficient trajectory-clustering algorithm based on an index tree[J]. Transactions of the Institute of Measurement and Control, 2012, 34(7): 850-861.
[7] BERMINGHAM L, LEE I. A general methodology for n-dimensional trajectory clustering[J]. Expert Systems with Applications, 2015, 42(21): 7573-7581.
[8] LEE J G, HAN J, WHANG K Y. Trajectory clustering: A partition-and-group framework[C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing: ACM Press, 2007: 593-604.
[9]I ZAKIAN Z, MESGARI M S, ABRAHAM A. Automated clustering of trajectory data using a particle swarm optimization[J]. Computers, Environment and Urban Systems, 2016, 55: 55-65.
[10] AGGARWal C C, Li Y, Wang J Y, et al. Frequent pattern mining with uncertain data[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Pairs: ACM Press, 2009: 29-38.
[11] GUPTA A, HARINARAYAN V, QUASS D. Aggregate-query processing in data warehousing environments[C]// Proceedings of the 21th International Conference on Very Large Data Bases. San Francisco: ACM Press, 1995: 358-369.
[12] HARTIGAN J A, WONG M A. Algorithm AS 136: A k-means clustering algorithm[J]. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979, 28(1): 100-108.
[13] GUHA S, RASTOGI R, SHIM K. CURE: An efficient clustering algorithm for large databases[J]. ACM SIGMOD Record, 1998, 27(2): 73-84.
[14] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: A new data clustering algorithm and its applications[J]. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
[15] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// Proceedings of the International Conference on Knowledge Discovery and Data Mining. Portland: ACM Press, 1996: 226-231.
[16] KUMAR K M, REDDY A R M. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method[J]. Pattern Recognition, 2016, 58: 39-48.
[17] SMITI A, ELOUDI Z. Soft DBSCAN: Improving DBSCAN clustering method using fuzzy set theory[C]// The 6th International Conference on Human System Interaction. INSPEC, 2013: 380-385.
[18] 刘卓, 杨悦, 张健沛, 等. 不确定度模型下数据流自适应网格密度聚类算法[J]. 计算机研究与发展, 2014, 51(11): 2518-2527.
LIU Zhuo, YANG Yue, ZHANG Jianpei, et al. An adaptive grid-density based data stream clustering algorithm based on uncertainty model[J]. Journal of Computer Research and Development, 2014, 51(11): 2518-2527.
[19] 安建瑞, 张龙波, 王雷, 等. 一种基于网格与加权信息熵的 OPTICS 改进算法[J]. 计算机工程, 2017, 43(2): 206-209.
AN Jianrui, ZHANG Longbo, WANG Lei et al. An improved OPTICS algorithm based on grid and weighted information entropy[J]. Computer Engineering, 2017, 43(2): 206-209.
[20] 倪巍伟，陈耿，吴英杰，等.一种基于局部密度的分布式聚类挖掘算法[J]. 软件学报, 2008, 19(9):2339-2348.
NI Weiwei, CHEN Geng, WU Yingjie, et al. Local density based distributed clustering algorithm[J]. Journal of Software, 2008, 19(9), 2339-2348.
[21] TRAN T N, DRAB K, DASZYKOWSKI M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics and Intelligent Laboratory Systems, 2013, 120(2): 92-96.
[22] ZHENG Yu, ZHANG Lizhu, XIE Xing, et al. Mining interesting locations and travel sequences from GPS trajectories[C]// Proceedings of International Conference on World Wild Web. Madrid, Spain: ACM Press: 791-800.
[23] ZHENG Yu, LI Quannan, CHEN Yukun, et al. Understanding mobility based on GPS data[C]// Proceedings of ACM Conference on Ubiquitous Computing . Seoul, Korea: ACM Press, 2008: 312-321.
[24] ZHENG Yu, XIE Xing, MA Weiying, GeoLife: A collaborative social networking service among user, location and trajectory[J]. Bulletin of the Technical Committee on Data Engineering, 2010, 33(2): 32-40.
[25] PATWARY M A, PALSETIA D, AGRAWAL A, et al. Scalable parallel OPTICS data clustering using graph algorithmic techniques[C]//Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Denver: ACM Press, 2013: No.49(1-12).

()
(

[1]	张玉州，张子为. 基于合作协同进化的多回收站点垃圾收运问题求解[J]. 中国科学技术大学学报, 2020, 50(5): 695-704.
[2]	徐雪丽，赵学靖. 稀疏谱聚类算法在高维数据上的应用[J]. 中国科学技术大学学报, 2017, 47(4): 311-319.

一种基于轨迹数据密度分区的分布式并行聚类方法

A trajectory data density partition based distributed parallel clustering method

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价