A trajectory data density partition based distributed parallel clustering method

doi:10.3969/j.issn.0253-2778.2018.01.007

Abstract

Abstract: The development of global positioning technology and location-based service have contributed to the development of trajectory big data. Trajectory clustering is one of the most important trajectory analysis tasks and has been extensively studied. Currently, most of the clustering methods operate in a single-processor mode, and large-scale trajectory data processing is a lengthy process, making it difficult to meet the strong timeliness of the trajectory analysis task. To solve the problem, a distributed parallel clustering method based on trajectory density partition is proposed. Firstly, the whole dataset is abstracted in a rectangular region, and the dataset is divided into several partitions with tasks that have almost the same amount by the transformation of the longest dimension of the rectangle, thus constructing the local datasets for distributed parallel clustering. Then the worker servers implement the DBSCAN clustering algorithm for the local partitions respectively, and the manager server merges and integrates the local clustering results. The experimental results show that the algorithm is effective and improves the computational rate of clustering analysis to a certain degree.

Key words: trajectory big data, distributed clustering, DBSCAN algorithm, clustering algorithm

CLC Number:

TP391

WANG Jiayu, ZHANG Zhenyu, CHU Zheng, WU Xiaohong. A trajectory data density partition based distributed parallel clustering method[J]. Journal of University of Science and Technology of China, 2018, 48(1): 47-56.

References

［1］
FANG Z X, SHAW S L, TU W, et al. Spatiotemporal analysis of critical transportation links based on time geographic concepts: a case study of critical bridges in Wuhan, China[J]. Journal of Transport Geography, 2012, 23(3): 44-59.
[2] LI Q N, ZHENG Y, XIE X, et al. Mining user similarity based on location history[C]// Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems. Irvine, USA: ACM Press, 2008: No. 34.
[3] ZHENG Y, ZHANG L Z, MA Z M, et al. Recommending friends and locations based on individual location history[J]. ACM Transactions on the Web, 2011, 5(1): 5(1-44).
[4] REHM F. Clustering of Flight Tracks[M]//AIAA Infotech@ Aerospace 2010. 2010: 3412.
[5] 赵恩来, 郝文宁, 赵飞, 等. 改进的基于密度的航迹聚类算法[J]. 计算机工程, 2011, 37(9): 270-272.
ZHAO Enlai, HAO Wenning, ZHAO Fei, et al. Improved track clustering algorithm based on density[J]. Computer Engineering, 2011, 37(9): 270-272.
[6] YUAN G, XIA S, ZHANG L, et al. An efficient trajectory-clustering algorithm based on an index tree[J]. Transactions of the Institute of Measurement and Control, 2012, 34(7): 850-861.
[7] BERMINGHAM L, LEE I. A general methodology for n-dimensional trajectory clustering[J]. Expert Systems with Applications, 2015, 42(21): 7573-7581.
[8] LEE J G, HAN J, WHANG K Y. Trajectory clustering: A partition-and-group framework[C]// Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. Beijing: ACM Press, 2007: 593-604.
[9]I ZAKIAN Z, MESGARI M S, ABRAHAM A. Automated clustering of trajectory data using a particle swarm optimization[J]. Computers, Environment and Urban Systems, 2016, 55: 55-65.
[10] AGGARWal C C, Li Y, Wang J Y, et al. Frequent pattern mining with uncertain data[C]// Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Pairs: ACM Press, 2009: 29-38.
[11] GUPTA A, HARINARAYAN V, QUASS D. Aggregate-query processing in data warehousing environments[C]// Proceedings of the 21th International Conference on Very Large Data Bases. San Francisco: ACM Press, 1995: 358-369.
[12] HARTIGAN J A, WONG M A. Algorithm AS 136: A k-means clustering algorithm[J]. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979, 28(1): 100-108.
[13] GUHA S, RASTOGI R, SHIM K. CURE: An efficient clustering algorithm for large databases[J]. ACM SIGMOD Record, 1998, 27(2): 73-84.
[14] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: A new data clustering algorithm and its applications[J]. Data Mining and Knowledge Discovery, 1997, 1(2): 141-182.
[15] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// Proceedings of the International Conference on Knowledge Discovery and Data Mining. Portland: ACM Press, 1996: 226-231.
[16] KUMAR K M, REDDY A R M. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method[J]. Pattern Recognition, 2016, 58: 39-48.
[17] SMITI A, ELOUDI Z. Soft DBSCAN: Improving DBSCAN clustering method using fuzzy set theory[C]// The 6th International Conference on Human System Interaction. INSPEC, 2013: 380-385.
[18] 刘卓, 杨悦, 张健沛, 等. 不确定度模型下数据流自适应网格密度聚类算法[J]. 计算机研究与发展, 2014, 51(11): 2518-2527.
LIU Zhuo, YANG Yue, ZHANG Jianpei, et al. An adaptive grid-density based data stream clustering algorithm based on uncertainty model[J]. Journal of Computer Research and Development, 2014, 51(11): 2518-2527.
[19] 安建瑞, 张龙波, 王雷, 等. 一种基于网格与加权信息熵的 OPTICS 改进算法[J]. 计算机工程, 2017, 43(2): 206-209.
AN Jianrui, ZHANG Longbo, WANG Lei et al. An improved OPTICS algorithm based on grid and weighted information entropy[J]. Computer Engineering, 2017, 43(2): 206-209.
[20] 倪巍伟，陈耿，吴英杰，等.一种基于局部密度的分布式聚类挖掘算法[J]. 软件学报, 2008, 19(9):2339-2348.
NI Weiwei, CHEN Geng, WU Yingjie, et al. Local density based distributed clustering algorithm[J]. Journal of Software, 2008, 19(9), 2339-2348.
[21] TRAN T N, DRAB K, DASZYKOWSKI M. Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J]. Chemometrics and Intelligent Laboratory Systems, 2013, 120(2): 92-96.
[22] ZHENG Yu, ZHANG Lizhu, XIE Xing, et al. Mining interesting locations and travel sequences from GPS trajectories[C]// Proceedings of International Conference on World Wild Web. Madrid, Spain: ACM Press: 791-800.
[23] ZHENG Yu, LI Quannan, CHEN Yukun, et al. Understanding mobility based on GPS data[C]// Proceedings of ACM Conference on Ubiquitous Computing . Seoul, Korea: ACM Press, 2008: 312-321.
[24] ZHENG Yu, XIE Xing, MA Weiying, GeoLife: A collaborative social networking service among user, location and trajectory[J]. Bulletin of the Technical Committee on Data Engineering, 2010, 33(2): 32-40.
[25] PATWARY M A, PALSETIA D, AGRAWAL A, et al. Scalable parallel OPTICS data clustering using graph algorithmic techniques[C]//Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Denver: ACM Press, 2013: No.49(1-12).

()
(

[1]	ZHANG Yuzhou, ZHANG Ziwei. Solving multi-station refuse collection problem based on cooperative co-evolutionary algorithm [J]. Journal of University of Science and Technology of China, 2020, 50(5): 695-704.
[2]	ZHANG Zhenya, ZHANG Meng, XIE Chenlei,ZHANG Zhaoxiang, FANG Qiansheng. Research on leakage location of secondary water distribution networks [J]. Journal of University of Science and Technology of China, 2017, 47(4): 336-341.
[3]	XU Xueli, ZHAO Xuejing. Application of sparse spectral clustering algorithm in high-dimensional data [J]. Journal of University of Science and Technology of China, 2017, 47(4): 311-319.