基于样本过滤和迁移学习的多领域情感分类模型

doi:10.3969/j.issn.0253-2778.2019.01.002

中国科学技术大学学报 ›› 2019, Vol. 49 ›› Issue (1): 8-14.DOI: 10.3969/j.issn.0253-2778.2019.01.002

基于样本过滤和迁移学习的多领域情感分类模型

曲昭伟

北京邮电大学计算机学院，北京 100876

收稿日期:2018-05-29 修回日期:2018-09-18 出版日期:2019-01-31 发布日期:2019-01-31
通讯作者: 赵燕娇
作者简介:曲昭伟，男，1970年生，博士/教授. 研究方向：人工智能、数据挖掘、计算机网络技术. E-mail: zwqu@bupt.edu.cn
基金资助:
国家自然科学基金(61672108)资助.

A multi-domain sentiment classification model based on sample filtering and transfer learning

QU Zhaowei

School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2018-05-29 Revised:2018-09-18 Online:2019-01-31 Published:2019-01-31

摘要/Abstract

摘要： 目前,大部分进行情感分类的模型以单个数据集进行训练并测试，然而对一个数据集训练得到的模型参数不适用于另一个数据集，模型不具备通用性．为此提出一种适用于多个领域的情感分类模型（MDSC），借助样本过滤和迁移学习，使训练得到的模型参数适用于多个领域下的不同数据集，使模型更具适用性和拓展性，即先将文档映射到领域的分布式表示，并以此作为领域分类和情感分类的桥梁，最后进行情感分类．为了使模型更具通用性，需要选择代表性强的数据样本，于是通过构建具有领域独立性的情感字典对属于同一文档的句子进行过滤，获取高质量的训练集．同时为了提高分类准确率并减少训练时间，使用基于参数的迁移学习方法，利用神经网络获得文档向量再进行分类．在包含15个不同领域的数据集上进行实验，与其他情感分类模型相比得到了较好的实验效果．

关键词: 情感分类, 样本过滤, 迁移学习, 情感字典, 神经网络

Abstract: Most of the models for sentiment classification are trained and tested on a single dataset. However, the model parameters obtained by training on one dataset are not suitable for another dataset and the model is not generic. A multi-domain sentiment classification model (MDSC) was proposed. With sample filtering and transfer learning, the trained model can be applied to different datasets in multiple domains and the model is more applicable and expandable. Specifically, a document is first mapped to the domain distribution which is used as a bridge between domain classification and sentiment classification, and then sentiment classification is completed. In order to make the model more generic, representative data samples should be selected. MDSC constructs a domain-independent sentiment lexicon to filter sentences that belong to the same document and obtain a high-quality training dataset. At the same time, to improve the classification accuracy and reduce the training time, parameter-based transfer learning with neutral networks is used to obtain the document embeddings for classification. Extensive experiments on datasets containing 15 different domains show that the proposed model can achieve better performance compared with traditional models when applied to datasets in multiple domains.

Key words: sentiment classification, sample filtering, transfer learning, sentiment lexicon, neural network

曲昭伟，赵燕娇，王晓茹. 基于样本过滤和迁移学习的多领域情感分类模型[J]. 中国科学技术大学学报, 2019, 49(1): 8-14.

QU Zhaowei,ZHAO Yanjiao,WANG Xiaoru. A multi-domain sentiment classification model based on sample filtering and transfer learning[J]. Journal of University of Science and Technology of China, 2019, 49(1): 8-14.

[1]	魏俣童, 鲍秉坤, 张子祺, 朱进. 不稳定传输中受损视频的低延迟修复方法[J]. 中国科学技术大学学报, 2021, 51(10): 717-724.
[2]	刘森, 张直政, 俞涛, 陈志波. 基于网格流的视频修补网络[J]. 中国科学技术大学学报, 2021, 51(1): 1-11.
[3]	辛守宇，郑蕊蕊，周瑜，刘文鹏，贺建军. 训练过程中使用支持集信息的单样本学习算法[J]. 中国科学技术大学学报, 2020, 50(8): 1187-1192.
[4]	王悦，李京. 基于可视化的卷积神经网络优化方法研究[J]. 中国科学技术大学学报, 2020, 50(7): 959-967.
[5]	杜淑颖，杜鹏，丁世飞. 基于CNN的假冒域名识别方法研究[J]. 中国科学技术大学学报, 2020, 50(7): 1019-1025.
[6]	杨杰，王相宁. 引入SSA的ARIMA-HPSO-Elman组合模型的汇率预测方法 ——基于人民币对美元汇率中间价数据[J]. 中国科学技术大学学报, 2020, 50(4): 516-527.
[7]	熊军林, 赵铎. 基于RGB图像的二阶段机器人抓取位置检测方法[J]. 中国科学技术大学学报, 2020, 50(1): 1-10.
[8]	严慧峰，黄定疆，谢垚，程霄，谢吉洋，朱晓蒙，马占宇. 短期电力负荷预测模型的比较研究[J]. 中国科学技术大学学报, 2019, 49(2): 119-124.
[9]	曾伟辉，李淼，张健，黄小平，王敬贤，袁媛. 面向农作物病害识别的高阶残差卷积神经网络研究[J]. 中国科学技术大学学报, 2019, 49(10): 781-790.
[10]	隋洪建，尚伟伟，李想，丛爽. 基于渐进式神经网络的机器人控制策略迁移[J]. 中国科学技术大学学报, 2019, 49(10): 812-819.
[11]	杨子文，陈蕾，浦建宇. 基于两层迁移卷积神经网络的抽象图像情感识别[J]. 中国科学技术大学学报, 2019, 49(1): 40-48.
[12]	龙奥明，毕秀春，张曙光. 基于LSTM神经网络的黑色金属期货套利策略模型[J]. 中国科学技术大学学报, 2018, 48(2): 125-132.
[13]	孙达昌，毕秀春. 基于深度学习算法的高频交易策略及其盈利能力[J]. 中国科学技术大学学报, 2018, 48(11): 923-932.
[14]	陈东杰，张文生，杨阳. 基于深度学习的高铁接触网定位器检测与识别[J]. 中国科学技术大学学报, 2017, 47(4): 320-327.
[15]	常欣卓，杨开忠，李新，沈红新，李恒年. 基于非线性自回归神经网络的局部大气密度预测方法[J]. 中国科学技术大学学报, 2017, 47(12): 1015-1022.

基于样本过滤和迁移学习的多领域情感分类模型

A multi-domain sentiment classification model based on sample filtering and transfer learning

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价