Journal of University of Science and Technology of China ›› 2018, Vol. 48 ›› Issue (11): 869-876.DOI: 10.3969/j.issn.0253-2778.2018.11.001

• Original Paper •     Next Articles

Outlier detection of Yangtze River basin meteorological databased on robust S-estimator

JIN Baisuo, LI Chikun   

  1. Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
  • Received:2018-04-18 Revised:2018-06-15 Accepted:2018-06-15 Online:2018-11-30 Published:2018-06-15

Abstract: Outlier is unavoidable in high-dimensional data, such as meteorological data, and the the most widely used least-square method has no robustness and sensitivity in detecting outliers. Robust estimation can make the estimators not strongly influenced by outliers, so that the outliers can be better identified. By adding Tukey’s biweight function constraints, a principal component analysis model based on robust S-estimator was established, which converges rapidly and does not need to assume the specific form of the distribution function. Then the observations were smoothed by B-spline basis, the mean residuals squared norm was used as the test statistic, and the adjusted box-plot which also has robustness was trained to detect the outliers. In the example, more than 58 thousand measurements of meteorological data over 60 years of 5 cities in Yangtze River basin were adopted. A comparative analysis of the data set with outlier detecting procedure based on principal component analysis and robust S-estimator has been conducted. It can be seen clearly that compared with the classical approach, the outlier detecting procedure based on robust S-estimator gives more information on the abnormal data, and thus can identify outliers better.

Key words: robust estimation, principal component analysis, outlier detection, high-dimensional data, dimension reduction

CLC Number: