一种基于朴素贝叶斯的校准标签排序方法

doi:10.3969/j.issn.0253-2778.2018.01.009

中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (1): 65-74.DOI: 10.3969/j.issn.0253-2778.2018.01.009

一种基于朴素贝叶斯的校准标签排序方法

张其龙，邓维斌，胡峰，瞿原，胡宗容

重庆邮电大学计算智能重庆市重点实验室，重庆 400065

收稿日期:2017-05-22 修回日期:2017-06-23 出版日期:2018-01-01 发布日期:2018-01-01
通讯作者: 胡峰
作者简介:张其龙，男，1989年生，硕士生.研究方向：计算智能、数据挖掘.E-mail:814150638@qq.com
基金资助:
国家自然科学基金（61473001，71071045，71131002）资助.

A calibrated lable ranking method based on naive Bayes

ZHANG Qilong, DENG Weibin, HU Feng, QU Yuan, HU Zongrong

Chongqing Key Laboratory of Computational Intelligence ,Chongqing University of Posts and Telecommunications ,Chongqing 400065 ,China)

Received:2017-05-22 Revised:2017-06-23 Online:2018-01-01 Published:2018-01-01

摘要/Abstract

摘要： 传统的校准标签排序算法（calibrated label ranking, CLR）利用成对标签关联进行转化来预测结果. 该算法的校准是在二元关系算法(binary relevance, BR)基础上进行比较产生结果，其预测对BR产生结果具有一定的依赖性, 因此该算法在预测某些数据集时具有一定的局限性.为了更好地区分标签的相关性和不相关性，提出了一种用于标签边界域的校准方法，对处于相关性标签和不相关性标签的边界部分采用贝叶斯概率进一步校正，从而提高边界域部分分类的准确性.基于朴素贝叶斯校准的标签排序方法(calibrated lable ranking method based on naive bayes, NBCLRM)与校准标签排序等7种传统的方法进行对比，实验结果表明, 本文提出的算法不仅可以根据需求修改阈值ε和μ来调节预测结果, 而且能够有效地提升传统多标签学习方法的性能.

关键词: 数据挖掘, 朴素贝叶斯, 校准标签排序算法, 多标签学习算法

Abstract: The traditional calibrated label ranking algorithm (calibrated label ranking, CLR) uses pairs of label associations to transform and predict results. Its algorithmic calibration is achievely comparing it with the basis of binary relevance (BR). Its prediction has a certain dependence on the results of BR, thus incurring some limitations on the prediction of some datasets. To better distinguish between the relevance and irrelevance of the label, a method is presented for calibrating label boundary regions, which further corrects the boundary portion of the relevant label and the irrelevant label using Bayesian probability, thereby improving the accuracy of the classification of the boundary domain. CLR method based on naive Bayes(NBCLRM) presented is compared with seven traditional methods such as calibrated label ranking. Experimental results show that the proposed algorithm can not only adjust prediction results by modifying the thresholds ε and μ, but also effectively improve the performance of traditional multi-label learning methods.

Key words: data mining, Naive Bayes, calibrated label ranking, multi-label learning algorithm

中图分类号:

TP391

张其龙，邓维斌，胡峰，瞿原，胡宗容. 一种基于朴素贝叶斯的校准标签排序方法[J]. 中国科学技术大学学报, 2018, 48(1): 65-74.

ZHANG Qilong, DENG Weibin, HU Feng, QU Yuan, HU Zongrong. A calibrated lable ranking method based on naive Bayes[J]. Journal of University of Science and Technology of China, 2018, 48(1): 65-74.

参考文献

［1］
王小妮. 数据挖掘技术[M]. 1版. 北京: 北京航空航天大学出版社, 2014.
[2] ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms[J]. Knowledge & Data Engineering IEEE Transactions on, 2014, 26(8): 1819-1837.
[3] 李思男, 李宁, 李战怀. 多标签数据挖掘技术:研究综述[J]. 计算机科学, 2013, 40(4): 14-21.
LI Sinan, LI Ning, LI Zhanhuai. Multi-label data mining: A survey[J]. Computer Science, 2013, 40(4): 14-21.
[4] ANCULEF R, FLAOUNAS I, CRISTIANINI N. Efficient classification of multi-labeled text streams by clashing[J]. Expert Systems with Applications, 2016, 41(11): 5431-5450.
[5] YU Y, PEDRYCZ W, MIAO D Q. Neighborhood rough sets based multi-label classification for automatic image annotation[J]. International Journal of Approximate Reasoning, 2013, 54(9):1373-1387.
[6] LO H Y, WANG J C, WANG H M, et al. Cost-sensitive multi-label learning for audio tag annotation and retrieval[J]. IEEE Transactions on Multimedia, 2011, 13(3): 518-529.
[7] YU G X, RANGWALA H, DOMENICONI C, et al. Protein function prediction using multilabel ensemble classification[J]. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2013, 10(4):1045-1057.
[8] YU G X, RANGWALA H, DOMENICONI C, et al. Protein function prediction with incomplete annotations[J]. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2014, 11(3):579-591.
[9] TAHA A Y, TIUN S. Binary relevance (BR) method classifier of multi-label classification for arabic text[J]. Journal of Theoretical and Applied Information Technology, 2016, 84(3): 414-422.
[10] FRNKRANZ J, HLLERMEIER E, MENCíA E L, et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008, 73(2): 133-153.
[11] WANG J, HUANG P L, SUN K W, et al. Ensemble of cost-sensitive hypernetworks for class-imbalance learning[C]// Proceedings of the International Conference on Systems, Man, and Cybernetics. Manchester, UK: IEEE Press, 2013: 1883-1888.
[12] TSOUMAKAS G, VLAHAVAS I. Random k-Labelsets: An Ensemble Method for Multilabel Classification[M]// Machine Learning: ECML 2007. Springer, 2007:A122.
[13] READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): 333-359.
[14] TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al. MULAN: A Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011, 12(7): 2411-2414.
[15] Mulan: A Java Library for Multi-Label Learning[DB/OL]. [2017-05-06]http://mulan.sourceforge.net/datasets-mlc.html.
[16] HE Z F, YANG M, LIU H D. Joint learning of multi-label classification and label correlations[J]. Journal of Software, 2014, 25(9): 1967-1981.
[17] 周志华. 机器学习[M].北京: 清华大学出版社, 2016.

()
(

[1]	胡心颖, 何钰, 孙广中. 基于概率图模型的计算机课程教学认知诊断框架[J]. 中国科学技术大学学报, 2021, 51(1): 12-21.
[2]	龚乐君，周佘海，程逸飞，高志宏，李华康. 单细胞RNA序列数据的PBMC相关细胞的识别[J]. 中国科学技术大学学报, 2020, 50(7): 1013-1018.
[3]	孙更新，宾晟. 多关系社交网络中基于兴趣匹配的网络舆情传播模型[J]. 中国科学技术大学学报, 2018, 48(9): 730-738.
[4]	李雅美，王昌栋. 基于标签的个性化旅游推荐[J]. 中国科学技术大学学报, 2017, 47(7): 547-555.
[5]	顾敏，郭庆，曹野，朱峰，顾彦慧，周俊生，曲维光，. 基于结构和文本特征的网页分类技术研究[J]. 中国科学技术大学学报, 2017, 47(4): 290-296.
[6]	卜尧，吴斌，陈玉峰，白德盟. BDAP——一个基于Spark的数据挖掘工具平台[J]. 中国科学技术大学学报, 2017, 47(4): 358-368.
[7]	赖英旭，许昕，杨震. 基于尾项加权的自适应文本分类方法研究[J]. 中国科学技术大学学报, 2011, 41(7): 607-614.

一种基于朴素贝叶斯的校准标签排序方法

A calibrated lable ranking method based on naive Bayes

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价