中国科学技术大学学报 ›› 2019, Vol. 49 ›› Issue (2): 138-148.DOI: 10.3969/j.issn.0253-2778.2019.02.009

• 原创论文 • 上一篇    下一篇

基于趋势信息的时间序列分类方法

林钱洪   

  1. 1.北京交通大学计算机与信息技术学院 北京 100044;2.交通数据分析与挖掘实验室(北京交通大学),北京 100044
  • 收稿日期:2018-07-17 修回日期:2018-09-18 出版日期:2019-02-28 发布日期:2019-02-28
  • 通讯作者: 王志海
  • 作者简介:林钱洪,男,1996生,硕士生.研究方向:机器学习,数据挖掘,时间序列分类.Email:17120380@bjtu.edu.cn
  • 基金资助:
    国家自然科学基金(61672086, 61702030, 61771058); 北京市自然科学基金(4182052); 中央高校基本科研业务费专项资金(2017YJS036)资助.

Trend information for time series classification

LIN Qianhong   

  1. 1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044; 2. Laboratory of Traffic Data Analysis and Mining(Beijing Jiaotong University), Beijing 100044
  • Received:2018-07-17 Revised:2018-09-18 Online:2019-02-28 Published:2019-02-28

摘要: 大部分时间序列数据分析的一个重要组成部分是相似性度量方式.在众多相似性度量方式中,基于最长公共子序列的相似性度量方式是一种常用的有效方法,但该方法仅仅度量序列点对点的数值差异,而忽略了序列的变化趋势.为此提出一种基于趋势信息的时间序列离散化方法并用最长公共子序列进行相似性度量.该方法能够很好地度量时间序列的趋势信息.此外,还将其与现有的点对点函数线性结合.与现有相似性度量方法不同,该方法能同时考虑时间序列的趋势信息和函数距离,相似性度量方案运用最近邻分类算法规则进行分类.为了进行全面的比较,在42个时间序列数据集上测试该算法的有效性.实验结果表明,所提出的方法能有效提高时间序列分类准确率.

关键词: 时间序列, 趋势信息, 时间序列离散化, 相似性度量

Abstract: One of most important parts of time series data analysis is to choose the appropriate similarity measurement. Among all similarity measurements, the longest common subsequence is a commonly used and effective method. However, the original method only measures the numerical differences of point-to-point sequences, which neglects the trend of the changing sequence. Therefore, a time series discretization method based on the trend information is proposed and the longest common subsequence is employed to carry out similarity measurements. This method can measure time series trend information well. In addition, it is linearly combined with the point-to-point comparison function. In contrast to well-known measures from the literature, the proposed method can take both the trend information of time series and point-to-point comparison function into consideration. The new similarity measurement is used in classification with the nearest neighbor rule. In order to provide a comprehensive comparison, a set of experiments have been conducted, testing its effectiveness on 42 real time series. The experimental results show that our method can effectively improve the accuracy rate of time series classification.

Key words: Time series, trend information, time series discretization, similarity measure