中国科学技术大学学报 ›› 2017, Vol. 47 ›› Issue (10): 823-836.DOI: 10.3969/j.issn.0253-2778.2017.10.004

• 论著 • 上一篇    下一篇

分布式RDF关键词近似搜索方法

陈远,汪璟玢   

  1. 福州大学数学与计算机科学学院,福建福州 350108
  • 收稿日期:2016-08-28 修回日期:2016-12-08 出版日期:2017-10-31 发布日期:2017-10-31
  • 通讯作者: 汪景玢
  • 作者简介:陈远,男,1991年生,硕士生.研究方向:数据挖掘.E-mail: 727947930@qq.com
  • 基金资助:
    国家自然科学基金(61300104),福建省科技拥军基金(JG2014001),福建省自然科学基金(2012J01168),福州大学科技发展基金(2013-XQ-32)资助.

Distributed keyword approximate search method for RDF

CHEN Yuan, WANG Jingbin   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China)
  • Received:2016-08-28 Revised:2016-12-08 Online:2017-10-31 Published:2017-10-31

摘要: 现有的RDF关键词搜索方法主要是在大规模的RDF数据图上直接进行搜索,未能充分利用RDF本体中的语义信息,迭代次数过多造成搜索效率和效果不理想.针对这些问题,借助Redis内存数据库集群,提出分布式RDF关键词近似搜索算法(DKASR),即在分布式平台上实现大规模数据的并行搜索.算法结合RDF本体的语义信息构建本体子图,利用语义评分函数对本体子图进行排序,借助MapReduce计算模型实现并行搜索并返回Top-k结果;如果返回的结果没有达到Top-k,则对本体子图进行扩展生成近似本体子图,使用语义相似度函数对近似本体子图进行排序,再利用MapReduce计算模型实现并行搜索,直到返回Top-k结果.实验结果表明,DKASR算法能够高效正确地实现RDF关键词近似搜索并有效返回Top-k结果.

关键词: RDF, 关键词, 近似搜索, Redis, MapReduce

Abstract: Existing RDF keyword search methods mainly search on the large-scale RDF data graph directly and do not make full use of the semantic information in the RDF ontology. Too many iterations lead to unfavorable search efficiency and unsatisfactory results. To solve these problems, a distributed keyword approximate search algorithm (DKASR) for RDF based on Redis memory database cluster was proposed and the parallel search of large-scale data on the distributed platform was realized. The algorithm constructs ontology sub-graphs by using the semantic information of RDF ontology, uses the semantic scoring function to sort ontology sub-graphs, and searches and returns the Top-k results concurrently with the aid of MapReduce computation model. If the results do not meet Top-k, ontology sub-graphs are extended to generate approximate ontology sub-graphs and the semantic similarity function is used to sort approximate ontology sub-graphs. Then, MapReduce computation model was used to realize the parallel search until the results meet Top-k. Finally, the results of experiments show that the DKASR algorithm can realize the RDF keyword approximate search and return the Top-k results efficiently and accurately.

Key words: RDF, keyword, approximate search, Redis, MapReduce

中图分类号: