中国科学技术大学学报 ›› 2017, Vol. 47 ›› Issue (8): 644-652.DOI: 10.3969/j.issn.0253-2778.2017.08.003

• 论著 • 上一篇    下一篇



  1. 1. 江苏方天电力技术有限公司, 江苏南京 211102;
    2.南京大学计算机软件新技术国家重点实验室, 江苏南京 210023
  • 收稿日期:2017-05-26 修回日期:2017-07-14 出版日期:2017-08-31 发布日期:2017-08-31
  • 通讯作者: 王皓
  • 作者简介:孙栓柱, 男,1973年生, 硕士/教授级高级工程师. 研究方向:电力行业节能减排技术及工业大数据挖掘. E-mail:
  • 基金资助:

An online outlier detection and confidence estimation algorithm based on Bayesian posterior ratio

SUN Shuanzhu, SONG Bei, LI Chunyan, WANG Hao   

  1. 1. Jiangsu Frontier Electric Technology Co. Ltd., Nanjing 211102,China;
    2. State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210023, China
  • Received:2017-05-26 Revised:2017-07-14 Online:2017-08-31 Published:2017-08-31

摘要: 为识别一类更新速度快、变化趋势平缓、缺少人工类标的大数据量工业时间序列中所存在的异常值,提出了一种以贝叶斯后验为基础的异常值在线检测及置信度评估算法.算法将预测检测和假设检验相结合,首先建立时间序列自回归模型,然后对预测残差作基于贝叶斯原理的后验检验,用后验概率对数比确定序列中的异常值.为减少识别过程中的误判,在检测完成后,利用自组织映射神经网络计算状态转移概率,进一步对已标记的异常值进行置信度评估.通过定期更新模型,算法各参数能动态保持与数据变化规律同步,提高了检测的准确率.实验结果表明,该算法能够对时间序列异常值准确快速地进行在线检测,同时给出可靠的置信度评估,具有较高的实用价值.

关键词: 时间序列, 异常检测, 贝叶斯后验, 置信度评估

Abstract: In order to satisfy the outlier detection requirements in one kind of high-speed, small-variance unlabeled industrial time series, an online outlier detection and confidence estimation algorithm based on Bayesian posterior ratio was proposed. The algorithm combined prediction and hypothesis testing, establishing the autoregressive model firstly and then using Bayesian posterior logarithm of residuals to identify outliers. To reduce misjudgment, the state transition probabilities were calculated by self-organizing map neural network and the reliability of detected outliers was evaluated afterwards. It updated models periodically to dynamically adapt to data changes, thus improving accuracy. Experimental results demonstrate that the online algorithm can effectively detect outliers in time series provide reliable confidence evaluation, bringing higher adaptability and practicability.

Key words: time series, outlier detection, Bayesian posterior ratio, confidence estimation
