基于数据划分的核岭回归加速算法

doi:10.3969/j.issn.0253-2778.2018.04.003

中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (4): 284-289.DOI: 10.3969/j.issn.0253-2778.2018.04.003

基于数据划分的核岭回归加速算法

刘恩江，宋云胜，梁吉业，

1.山西大学计算机与信息技术学院，山西太原 030006；
2.计算智能与中文信息处理教育部重点实验室，山西太原 030006

收稿日期:2017-05-23 修回日期:2017-06-24 出版日期:2018-04-30 发布日期:2018-04-30
通讯作者: 梁吉业
作者简介:刘恩江，男，1993年生，硕士研究生，研究方向：机器学习与数据挖掘. E-mail：2510087270@qq.com
基金资助:
国家自然科学基金重点项目（61432011, U1435212）资助.

An accelerator for kernel ridge regression algorithms based on data partition

LIU Enjiang, SONG Yunsheng, LIANG Jiye,

1. School of Computer and Information Technology，Shanxi University，Taiyuan 030006， China；
2. Key Laboratory of Computational Intelligence and Chinese Information Processing， Taiyuan 030006， China）

Received:2017-05-23 Revised:2017-06-24 Online:2018-04-30 Published:2018-04-30

摘要/Abstract

摘要： 核岭回归(KRR)是一种重要的回归算法，具有可解释性、强泛化性能等优点，被广泛应用于模式识别、数据挖掘等领域；然而面对大规模数据时，核岭回归存在着训练效率较低的缺陷.为此，利用分而治之思想提出一种基于数据划分的核岭回归加速算法(PP-KRR).首先利用一簇平行超平面将当前数据所在的空间划分为m个互不相交的区域；其次在划分后的每个区域上训练KRR模型；最后每个KRR模型预测处在同一区域内的未标记实例.在真实数据集上与传统的算法进行实验比较分析，实验结果表明，提出的算法在保持一定预测精度的同时，能够获得更短的训练时间.

关键词: 核岭回归, 分而治之, 平行分割, 主成分分析

Abstract: Kernel ridge regression (KRR) is an important regression algorithm widely used in pattern recognition and data mining for its interpretability and strong generalization capability. However, it has the defect of low training efficiency when faced with large-scale data. To address this problem， an accelerating algorithm is proposed which uses the concept of divide-and-conquer for kernel ridge regression based on data partition (PP-KRR). Firstly, the current training data space is divided into m mutually disjoint regions by a bunch of parallel hyperplanes. Secondly, each KRR model is trained on each region respectively. Finally, each unlabeled instance is predicted by the KRR model within the same region. Comparisons with three traditional algorithms on real datasets show that the proposed algorithm obtains similar prediction accuracy with less training time.

Key words: kernel ridge regression, divide-and-conquer, parallel partition, principal component analysis

中图分类号:

TP391

刘恩江，宋云胜，梁吉业，. 基于数据划分的核岭回归加速算法[J]. 中国科学技术大学学报, 2018, 48(4): 284-289.

LIU Enjiang, SONG Yunsheng, LIANG Jiye,. An accelerator for kernel ridge regression algorithms based on data partition[J]. Journal of University of Science and Technology of China, 2018, 48(4): 284-289.

参考文献

［1］
ROSIPAL R. Kernel-based regression and objective nonlinear measures to assess brain functioning[D]. Scotland: University of Paisley, 2001.
[2] SCHLKOPF B, SMOLA A, MLLER K R. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural computation, 1998, 10(5): 1299-1319.
[3] FINE S, SCHEINBERG K. Efficient SVM training using low-rank kernel representations[J]. Journal of Machine Learning Research, 2001, 2: 243-264.
[4] WILLIAMS C K I, SEEGER M. Using the Nystrm method to speed up kernel machines[C]// Proceedings of the 13th Conference on Neural Information Processing Systems. Cambridge: MIT press, 2000: 661-667.
[5] BACH F. Sharp analysis of low-rank kernel matrix approximations[J]. Journal of Machine Learning Research, 2012, 30: 185-209.
[6] ALAOUI A E, MAHONEY M W. Fast randomized kernel ridge regression with statistical guarantees[C]// Proceedings of the 28th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 775-783.
[7] RUDI A, CAMORIANO R, ROSASCO L. Less is more: Nystrm computational regularization[C]// Proceedings of the 28th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 1657-1665.
[8] RAHIMI A, RECHT B. Random features for large-scale kernel machines[C]// Proceedings of the 20th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2007: 1177-1184.
[9] YAO Y, ROSASCO L, CAPONNETTO A. On early stopping in gradient descent learning[J]. Constructive Approximation, 2007, 26(2): 289-315.
[10] BLANCHARD G, KRMER N. Optimal learning rates for kernel conjugate gradient regression[C]// Proceedings of the 20th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2010: 226-234.
[11] ZHANG Y, DUCHI J, WAINWRIGHT M. Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates[J]. Journal of Machine Learning Research, 2015, 16: 3299-3340.
[12] GU Q, HAN J. Clustered support vector machines[C]// Proceedings of the 16th International Conference on Artificial Intelligence and Statistics. Scottsdale: PMLR, 2013, 31: 307-315.
[13] HSIEH C J, SI S, DHILLON I S. A divide-and-conquer solver for kernel support vector machines[C]// Proceedings of the 31th International Conference on Machine Learning. Beijing: PMLR, 2014, 32(1): 566-574.
[14] TANDON R, SI S, RAVIKUMAR P, et al. Kernel ridge regression via partitioning[J/OL]. (2016.8.5) [2017.5.24]. https : // arxix.org/pdf/1608.01976.pdf
[15] GITTENS A, MAHONEY M W. Revisiting the Nystrm method for improved large scale machine learning[J]. Journal of Machine Learning Research, 2016, 17(1): 3977-4041.
[16] HUANG P S, AVRON H, SAINATH T N, et al. Kernel methods match deep neural networks on TIMIT[C]// Proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE press, 2014: 205-209.
[17] BOTTOU L, VAPNIK V. Local learning algorithms[J]. Neural computation, 1992, 4(6): 888-900.
[18] ZHANG Y, DUCHI J C, WAINWRIGHT M J. Communication-efficient algorithms for statistical optimization[J]. Journal of Machine Learning Research, 2012, 14(1): 3321-3363.
[19] MACKEY L, TALWALKAR A, JORDAN M I. Divide-and-Conquer matrix factorization[C]// Proceedings of the 25th Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2012: 1134-1142.
[20] PAN Y, XIA R, YIN J, et al. A divide-and-conquer method for scalable robust multitask learning[J]. IEEE transactions on neural networks and learning systems, 2015, 26(12): 3163-3175.

()
(

[1]	黎明, 温灿红. 通过稀疏PCA分析新冠疫情对股市的影响[J]. 中国科学技术大学学报, 2021, 51(5): 404-418.
[2]	金百锁，李炽坤. 基于稳健S估计的长江流域气象异常值检测[J]. 中国科学技术大学学报, 2018, 48(11): 869-876.
[3]	陆玮，邵利民. 一种基于残余矩阵方差的主因子数估计方法[J]. 中国科学技术大学学报, 2014, 44(11): 881-886.
[4]	徐利斌，孙立广，彭子成，罗泓灏，王吉怀. 蒙城尉迟寺文化层的地质地球化学研究Ⅱ ——古遗址气候变化反演及代用指标选择[J]. 中国科学技术大学学报, 2009, 39(7): 673-682.

基于数据划分的核岭回归加速算法

An accelerator for kernel ridge regression algorithms based on data partition

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics

本文评价