中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (7): 993-1002.DOI: 10.3969/j.issn.0253-2778.2020.07.017

• 论著 • 上一篇    下一篇

图正则化的模糊局部坐标编码概念分解模型

张怿恺,彭勇,孔万增,文益民   

  1. 1.杭州电子科技大学计算机科学与技术学院,浙江杭州 310018;2.桂林电子科技大学计算机与信息安全学院,广西桂林 541000
  • 收稿日期:2020-04-30 修回日期:2020-06-22 接受日期:2020-06-22 出版日期:2020-07-31 发布日期:2020-06-22
  • 通讯作者: 彭勇
  • 作者简介:张怿恺,男,1998年生,硕士生,研究方向:机器学习与模式识别.E-mail:yikaizhang@hdu.edu.cn
  • 基金资助:
    国家自然科学基金(61971173, 61602140);浙江省科技计划(2017C33049);中国博士后科学基金(2017M620470);浙江省新苗人才计划(2019R407030)资助.

Fuzzy local coordinate concept factorization with graph regularization

ZHANG Yikai, PENG Yong, KONG Wanzeng, WEN Yimin   

  1. 1.School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China; 2.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541000, China
  • Received:2020-04-30 Revised:2020-06-22 Accepted:2020-06-22 Online:2020-07-31 Published:2020-06-22

摘要: 现有的基于矩阵分解聚类模型训练过程大多需要两个独立的步骤,一是通过自身的模型对数据集进行训练获得系数矩阵,二是对得到的系数矩阵进一步使用K-means方法来获得最终的聚类结果.这种两阶段模式一方面增加了计算消耗,也会因为K-means对初始聚类中心的敏感,会对聚类效果产生一定的影响.针对此问题,本文提出了一种图正则化的模糊局部坐标编码概念分解模型.该模型通过对系数矩阵添加约束使得系数矩阵行和为1,从而避免了再次使用K-means方法进行二次训练,而直接由系数矩阵获得聚类结果.另外,由于此系数矩阵的约束.该模型实现了模糊聚类,增强了聚类结果的可解释性.本文通过对人工合成数据的测试,验证了该模型的模糊性与可解释性;同时在常用的标准数据集上,通过与现有的聚类方法相比较,同样获得了较好的聚类效果.

关键词: 概念分解, 局部坐标编码, 模糊聚类, 图正则化

Abstract: Matrix Factorization is an effective and efficient method to solve clustering problems in machine learning. However, for most traditional which factorization based models in clustering, there are two necessary steps to get the final assignments. First, original data can be decomposed to a basis matrix and a coefficient matrix through a certain model. Second, the learned coefficient matrix is fed into K-means to make discretization. This two-step paradigm causes extra computational burden and may have some side effect on the final results due to the sensitivity to initialization of K-means. To this end, a novel model termed fuzzy local coordinate concept factorization with graph regularizer (FLCCF-G) is proposed. Which avoids using K-means by enforcing the sum of each row of the non-negative coefficient matrix to equal to one. Then the final clustering results can obtained directly by checking the maximum value of each row of the coefficient matrix. In addition, through this constraint, our proposed model changes is a fuzzy clustering model rather than hard clustering, indicating that the model has better interpretability to data points in boundaries of different clusters. Extensive experimental results on synthetic and Benchmark data sets indicate the better performance of FLCCF-G on data clustering.

Key words: concept factorization, local coordinate coding, fuzzy clustering, graph regularizer

中图分类号: