中国科学技术大学学报 ›› 2017, Vol. 47 ›› Issue (1): 63-69.DOI: 10.3969/j.issn.0253-2778.2017.01.009

• 原创论文 • 上一篇    下一篇

利用新词探测提高中文微博的情感表达抽取

万 琪   

  1. 1.四川大学计算机学院,四川成都 610065; 2.浙江师范大学数理与信息工程学院,浙江金华 321004
  • 收稿日期:2016-03-01 修回日期:2016-09-17 出版日期:2017-01-31 发布日期:2017-01-31
  • 通讯作者: 于中华
  • 作者简介:万琪,男,1991年生,硕士生.研究方向:自然语言处理.E-mail:youngwq12@163.com
  • 基金资助:
    四川省科技支撑项目(2014GZ0063), 浙江省自然科学基金(LY12F02010)资助.

Improving emotion expression extraction in Chinese microblogs via new words detection

WAN Qi   

  1. 1. College of Computer Science, Sichuan University, Chengdu 610065, China; 2. College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinghua 321004, China
  • Received:2016-03-01 Revised:2016-09-17 Online:2017-01-31 Published:2017-01-31

摘要: 情感表达抽取工作是细粒度情感挖掘的重要任务之一.中文微博中包含大量网络新词和不规范词,现有的方法在进行微博情感表达抽取任务时不能很好地处理上述情况.通过研究发现,微博中新词大量分布在文本的情感表达部分,于是提出了基于CRF的联合抽取模型,即将新词发现融入到情感表达抽取任务中,从而改进原有工作的不足.实验结果表明,新词探测对微博文本情感表达抽取有很好的指示作用,在电影领域和开放领域的微博数据集上分别进行实验,F1值均提高了2%以上.

关键词: 情感分析, 新词发现, 条件随机场, 信息抽取

Abstract: Emotion expression extraction is one of the important tasks of fine-grained sentiment mining. Existing methods lack efficiency in dealing with this task in Chinese microblogs because there are many new words and non-standard words in them. It’s found in this paper that a large number of new words are distributed in emotional expressions of the text in Chinese microblogs. A combined extraction model based on CRF is proposed, which incorporates new word detection into the task to improve the original work. The experimental results show that new word detection has good correlation with emotion expression extraction from Chinese microblogs, and that F1 value increases more than 2% on both the data sets of the movie field and the open field in Chinese microblogs.

Key words: sentiment analysis, new word detection, conditional random field, information extraction