图正则化的模糊局部坐标编码概念分解模型

doi:10.3969/j.issn.0253-2778.2020.07.017

中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (7): 993-1002.DOI: 10.3969/j.issn.0253-2778.2020.07.017

图正则化的模糊局部坐标编码概念分解模型

张怿恺，彭勇，孔万增，文益民

1.杭州电子科技大学计算机科学与技术学院，浙江杭州 310018；2.桂林电子科技大学计算机与信息安全学院，广西桂林 541000

收稿日期:2020-04-30 修回日期:2020-06-22 接受日期:2020-06-22 出版日期:2020-07-31 发布日期:2020-06-22
通讯作者: 彭勇
作者简介:张怿恺，男，1998年生，硕士生，研究方向：机器学习与模式识别.E-mail:yikaizhang@hdu.edu.cn
基金资助:
国家自然科学基金(61971173, 61602140)；浙江省科技计划(2017C33049)；中国博士后科学基金(2017M620470)；浙江省新苗人才计划(2019R407030)资助.

Fuzzy local coordinate concept factorization with graph regularization

ZHANG Yikai, PENG Yong, KONG Wanzeng, WEN Yimin

1.School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China; 2.School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541000, China

Received:2020-04-30 Revised:2020-06-22 Accepted:2020-06-22 Online:2020-07-31 Published:2020-06-22

摘要/Abstract

摘要： 现有的基于矩阵分解聚类模型训练过程大多需要两个独立的步骤,一是通过自身的模型对数据集进行训练获得系数矩阵,二是对得到的系数矩阵进一步使用K-means方法来获得最终的聚类结果.这种两阶段模式一方面增加了计算消耗,也会因为K-means对初始聚类中心的敏感,会对聚类效果产生一定的影响.针对此问题,本文提出了一种图正则化的模糊局部坐标编码概念分解模型.该模型通过对系数矩阵添加约束使得系数矩阵行和为1,从而避免了再次使用K-means方法进行二次训练,而直接由系数矩阵获得聚类结果.另外,由于此系数矩阵的约束.该模型实现了模糊聚类,增强了聚类结果的可解释性.本文通过对人工合成数据的测试,验证了该模型的模糊性与可解释性;同时在常用的标准数据集上,通过与现有的聚类方法相比较,同样获得了较好的聚类效果.

关键词: 概念分解, 局部坐标编码, 模糊聚类, 图正则化

Abstract: Matrix Factorization is an effective and efficient method to solve clustering problems in machine learning. However, for most traditional which factorization based models in clustering, there are two necessary steps to get the final assignments. First, original data can be decomposed to a basis matrix and a coefficient matrix through a certain model. Second, the learned coefficient matrix is fed into K-means to make discretization. This two-step paradigm causes extra computational burden and may have some side effect on the final results due to the sensitivity to initialization of K-means. To this end, a novel model termed fuzzy local coordinate concept factorization with graph regularizer (FLCCF-G) is proposed. Which avoids using K-means by enforcing the sum of each row of the non-negative coefficient matrix to equal to one. Then the final clustering results can obtained directly by checking the maximum value of each row of the coefficient matrix. In addition, through this constraint, our proposed model changes is a fuzzy clustering model rather than hard clustering, indicating that the model has better interpretability to data points in boundaries of different clusters. Extensive experimental results on synthetic and Benchmark data sets indicate the better performance of FLCCF-G on data clustering.

Key words: concept factorization, local coordinate coding, fuzzy clustering, graph regularizer

中图分类号:

TP18

张怿恺，彭勇，孔万增，文益民. 图正则化的模糊局部坐标编码概念分解模型[J]. 中国科学技术大学学报, 2020, 50(7): 993-1002.

ZHANG Yikai, PENG Yong, KONG Wanzeng, WEN Yimin. Fuzzy local coordinate concept factorization with graph regularization[J]. Journal of University of Science and Technology of China, 2020, 50(7): 993-1002.

参考文献

［1］
XU W, LIU X, GONG Y. Document clustering based on non-negative matrix factorization[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2003: 267-273.
[2] WANG Y X, ZHANG Y J. Nonnegative matrix factorization: A comprehensive review[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 25(6): 1336-1353.
[3] HE Y C, LU H T, HUANG L, et al. Non-negative matrix factorization with pairwise constraints and graph Laplacian[J]. Neural Processing Letters, 2015, 42(1): 167-185.
[4] XU W, GONG Y. Document clustering by concept factorization[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004: 202-209.
[5] CAI D, HE X, HAN J, et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(8): 1548-1560.
[6] CAI D, HE X, HAN J. Locally consistent concept factorization for document clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 23(6): 902-913.
[7] CHEN Y, ZHANG J, CAI D, et al. Nonnegative local coordinate factorization for image representation[J]. IEEE Transactions on Image Processing, 2012, 22(3): 969-979.
[8] LIU H, YANG Z, YANG J, et al. Local coordinate concept factorization for image representation[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 25(6): 1071-1082.
[9] 祁宏宇,吴小俊,王士同,杨静宇.一种协同的FCPM模糊聚类算法[J].模式识别与人工智能,2010,23(01):120-126.
[10] 马文萍,黄媛媛,李豪,等. 基于粗糙集与差分免疫模糊聚类算法的图像分割[J]. 软件学报,2014,25(11):2675-2689.
[11] 苏冬雪,吴小俊.基于多特征模糊聚类的图像融合方法[J].计算机辅助设计与图形学学报,2006,18(6):838-843.
[12] YANG B, FU X, SIDIROPOULOS N D. Learning from hidden traits: Joint factor analysis and latent clustering[J]. IEEE Transactions on Signal Processing, 2016, 65(1): 256-269.
[13] YU K, ZHANG T, GONG Y. Nonlinear learning using local coordinate coding[C]//Advances in Neural Information Processing Systems. 2009: 2223-2231.
[14] NIE F, SHI S J, LI X. Semi-supervised learning with auto-weighting feature and adaptive graph[J]. IEEE Transactions on Knowledge and Data Engineering, 2019.
[15] KYRILLIDIS A, BECKER S, CEVHER V, et al. Sparse projections onto the simplex[C]//International Conference on Machine Learning. 2013: 235-243.
[16] NIE F, YANG S, ZHANG R, et al. A general framework for auto-weighted feature selection via global redundancyminimization[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2428-2438.
[17] CHEN X, YUAN G, NIE F, et al. Semi-supervised feature selection via sparse rescaled linear square regression[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 32(1): 165-176.
[18] NIE F, HUANG H, CAI X, et al. Efficient and robust feature selection via joint 2, 1-norms minimization[C]//Advances in neural information processing systems. 2010: 1813-1821.
[19] 沈浩,王士同.按风格划分数据的模糊聚类算法[J].模式识别与人工智能,2019,32(3):204-213.
[20] SHI J, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.

()
()

[1]	马玉莲, 崔文泉. 一种预测高频价格的端到端双目标多任务方法[J]. 中国科学技术大学学报, 2021, 51(3): 246-258.
[2]	陶陶，柏建树，刘恒，侯书东，郑啸. 基于WGAN反馈的深度学习差分隐私保护方法[J]. 中国科学技术大学学报, 2020, 50(8): 1064-1071.
[3]	高翔，陈力. 分组随机梯度下降法：掉队和延迟的平衡[J]. 中国科学技术大学学报, 2020, 50(8): 1156-1161.
[4]	李鹏，郑宇，张谈贵. 基于视觉显著性的无人机目标跟踪[J]. 中国科学技术大学学报, 2020, 50(8): 1162-1169.
[5]	王悦，李京. 基于可视化的卷积神经网络优化方法研究[J]. 中国科学技术大学学报, 2020, 50(7): 959-967.
[6]	崔文泉，余厚莹，侯晓天. 不均衡数据情形的基于聚焦损失的CGAN的集成分类方法[J]. 中国科学技术大学学报, 2020, 50(7): 968-976.
[7]	苏守宝，陈秋鑫，王池社，李智. 群活性反馈的变异自适应分数阶粒子群优化[J]. 中国科学技术大学学报, 2020, 50(7): 1026-1034.
[8]	石陆魁，郭林林，房子哲，张军. 基于Spark的并行ISOMAP算法[J]. 中国科学技术大学学报, 2019, 49(10): 842-850.
[9]	张会敏，杨明，吕静. 基于自适应核联合稀疏表示的多特征高光谱图像分类[J]. 中国科学技术大学学报, 2018, 48(4): 298-306.
[10]	饶齐，杨燕*，滕飞. 基于多视图加权聚类集成的高速列车工况识别[J]. 中国科学技术大学学报, 2018, 48(1): 35-41.
[11]	张皓，吴建鑫，. 集成最大汇合: 最大汇合时只有最大值有用吗[J]. 中国科学技术大学学报, 2017, 47(10): 799-807.

图正则化的模糊局部坐标编码概念分解模型

Fuzzy local coordinate concept factorization with graph regularization

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价