Journal of University of Science and Technology of China ›› 2016, Vol. 46 ›› Issue (9): 719-726.DOI: 10.3969/j.issn.0253-2778.2016.09.002

• Original Paper • Previous Articles    

Semantic similarity measurement based on low-dimensional sense vector model

CAI Yuanyuan, LU Wei   

  1. School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
  • Received:2016-03-11 Revised:2016-09-17 Accepted:2016-09-17 Online:2016-09-17 Published:2016-09-17

Abstract: Semantic similarity measurement enables the improvement of information retrieval in terms of accuracy and efficiency, so it has become one of the core components in text processing. To solve the problem of lexical ambiguity like polysemy, a sense vector model based on vector composition was proposed, which integrates knowledge base with corpus by fusing multiple semantic features derived from both of them. This model focuses on the continuous distributed word vectors and the inherent semantic properties in WordNet. Firstly, the continuous word vectors were trained from a textual corpus in advance by the neural network language model in deep learning. Then multiple semantic information and relationship information were extracted from WordNet to augment original vectors and generate sense vectors for words. Hence, the semantic similarity between concepts can be measured by the similarity of sense vectors. The experimental results on benchmark indicate that this measure outperforms state-of-the-art measures based on either WordNet or corpora. Compared with the measures based on original distributed word vectors, the proposed measure has an improvement of Pearson correlation coefficient (7.5%). The outstanding results also show the contribution of multiple feature fusion to measuring the conceptual semantic similarity.

Key words: sense vector, feature fusion, distributed word embedding, semantic similarity

CLC Number: