中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (6): 477-485.DOI: 10.3969/j.issn.0253-2778.2018.06.006

• 论著 • 上一篇    下一篇

基于KS检验的高斯混合模型分裂与合并算法

蒋硕然,陈亚瑞*,秦智飞,杨巨成   

  1. 天津科技大学计算机科学与信息工程学院,天津 300457
  • 收稿日期:2017-09-20 修回日期:2018-04-10 接受日期:2018-04-10 出版日期:2018-06-30 发布日期:2018-04-10
  • 通讯作者: 陈亚瑞
  • 作者简介:蒋硕然,男,1992年生,硕士。研究方向:机器学习。E-mail: srjiang@mail.tust.edu.cn
  • 基金资助:
    国家自然科学基金(61402332, 61502338,61402331); 天津市科学技术委员会项目(17JCQNJC00400,15ZCZDGX00200,15JCQNJC00700)资助.

Split and merge algorithm for Gaussian mixture model based on KS test

JIANG Shuoran, CHEN Yarui, QIN Zhifei, YANG Jucheng   

  1. Tianjin University of Science & Technology, College of Computer Science and Information Engineering, Tianjin 300457
  • Received:2017-09-20 Revised:2018-04-10 Accepted:2018-04-10 Online:2018-06-30 Published:2018-04-10

摘要: 高斯混合模型是有限个独立高斯模型的线性组合,估计其子模型个数是一个重要的研究问题.一类典型的算法是以最短描述长度为目标函数,在迭代过程中通过对子模型进行分裂与合并操作确定子模型个数.这类方法一般采用熵比、KL散度和模型相似度作为分裂和合并的判别准则.但是熵比或KL散度准则对稀疏子模型和凹形子模型过于敏感导致过度分裂,模型相似度准则不能反映合并后模型的高斯拟合优度导致过度合并.在算法的迭代过程中,这些过度分裂与合并操作产生振荡现象.针对估计子模型个数时出现的过度分裂与合并问题,基于KS检验的高斯混合模型的分裂与合并算法选择熵比与KS检验作为分裂的判别准则,模型相似度和KS检验作为合并的判别准则.最后在六个数据集上进行了实验证明算法的有效性.

关键词: 高斯混合模型, 最短描述长度, 熵比, KS检验

Abstract: Gaussian mixture model is a linear combination of finite numbers of independent Gaussian models. Estimating the number of components is an important research area. One class of algorithms based on the minimum description length determine the number of components by splitting and merging components during the iterations. Traditional algorithms use entropy ratio, KL divergence, model similarity as split and merge criteria. However, entropy ratio and KL divergence might result in excessive split because of their excessive sensitivity to sparse or concave models, and model similarity might result in excessive merge because of its inability to assess the merged models’ goodness of fitting Gaussian. In the iterations of algorithm, these excessive splitting and merging operations may cause oscillations. For these problems entropy ratio and KS test as split criteria, and models similarity and KS test were used as merge criteria, which be called problems, a split and merge algorithm for Gaussian mixture model based on KS test is proposed, with entropy ratio and KS test used as split criteria and model similarity and KS test as merge criteria. This algorithm is capable of preventing excessive split and merge, as validated by experiments conducted on seven datasets.

Key words: Gaussian mixture model, minimum description length, entropy ratio, KS test

中图分类号: