中国科学技术大学学报 ›› 2016, Vol. 46 ›› Issue (10): 867-873.DOI: 10.3969/j.issn.0253-2778.2016.10.011

• 原创论文 • 上一篇    下一篇

基于真实世界临床数据的失眠病判别分析

朱 威   

  1. 1.同济大学电子与信息工程学院,上海 201804;2.上海金灯台信息科技有限公司,上海 201801 3.中国中医科学院中医临床基础医学研究所,北京 100700;4.中国中医科学院中医药数据中心,北京 100700
  • 收稿日期:2016-03-01 修回日期:2016-09-16 出版日期:2016-10-31 发布日期:2016-10-31
  • 通讯作者: 张磊
  • 作者简介:朱威,男,1992年生,硕士. 研究方向:数据挖掘. E-mail: zwtj2010@163.com
  • 基金资助:
    国家自然科学基金(61273305,81503680),中央级公益性科研院所基本科研业务费专项资金(ZZ0908032)资助.

Insomnia discriminant analysis based on real-world clinical data

ZHU Wei   

  1. 1.College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China; 2.Shanghai Menorah Information Technology Co., Ltd, Shanghai 201801, China; 3.Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China; 4.National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
  • Received:2016-03-01 Revised:2016-09-16 Online:2016-10-31 Published:2016-10-31

摘要: 基于真实世界中医医疗数据集,提出了针对性的中医非结构化转结构化的数据预处理方法,并在监督分类模型和半监督分类模型上对得到的症状特征进行了实验验证.在真实医疗数据集上进行实验,发现无论是监督分类算法还是半监督分类算法在所提出的数据预处理模型上都得到了较优的分类效果,并且发现标签传播算法不仅在分类器稳定性上取得了较大的优势,在带标注数据较少时,仍能取得较好的实验结果.

关键词: 结构化, 半监督学习, 标签传播, 中医, 疾病判别, 失眠

Abstract: A new data preprocessing method based on the real-world medical database was proposed, which can change unstructured data into structured data. Supervised algorithms and semi-supervised algorithms were utilized to verify the effectiveness of the clinical features which were obtained through our data preprocessing method. From the experimental results on the real world dataset, it is found that both supervised classification and semi-supervised algorithms can get a better result based on the clinical symptom features trained from our data preprocessing method. And it is found that the label propagation algorithm not only achieves a great stability on the real Chinese medicine database when compared with classical classification algorithm, but also obtains good results when the ratio is low.

Key words: structurization, semi-supervised learning, label propagation algorithm, TCM, disease identification, insomnia