Journal of University of Science and Technology of China ›› 2015, Vol. 45 ›› Issue (4): 314-320.DOI: 10.3969/j.issn.0253-2778.2015.04.009

• Original Paper • Previous Articles    

A latent semantic analysis classification technique based on optimized categorization information

JI Duo, BI Chen, CAI Dongfeng   

  1. 1. Cyber Crime Investigation Department, National Police University of China,Shenyang 110854, China;2. Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang 110136, China
  • Received:2014-03-21 Revised:2014-11-04 Accepted:2014-11-04 Online:2014-11-04 Published:2014-11-04

Abstract: As an effective method in the way of dimensionality reduction, latent semantic analysis( LSA) has been widely applied to many text learning missions, such as information retrieval and text categorization. Based on professional literature text classification tasks, features of text from same and different categories were analyzed under a strict classification system, patent documents classification was taken as an example, an optimized LSA classification technique was purposed based on categorization information. Utilizing features information from different category text, the technique divided original documents into a variety of fake documents, strengthens occurrence frequency of exclusive features from different categories, thus building optimized latent semantic space and improving the performance of the classification model. The experimental result shows that the method effectively improves categorization precision when applied to text categorization.

Key words: Latent Semantic Indexing, Term Co-occurrence, Text Categorization

CLC Number: