中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (8): 1058-1063.DOI: 10.3969/j.issn.0253-2778.2020.08.003

• 论著 • 上一篇    下一篇

基于全局的引文网络影响力最大化算法

张文静,班志杰   

  1. 1.内蒙古大学计算机学院,内蒙古自治区社会计算与数据处理重点实验室,内蒙古呼和浩特 010000; 2.呼和浩特市规划展览馆, 内蒙古呼和浩特 010000
  • 收稿日期:2020-06-05 修回日期:2020-07-28 接受日期:2020-07-28 出版日期:2020-08-31 发布日期:2020-07-28
  • 通讯作者: 班志杰
  • 作者简介:张文静,女,硕士,研究方向:数据挖掘. E-mail: 2501648350@qq.com
  • 基金资助:
    国家自然科学基金(61662053) 资助.

Citation network’s influence maximization algorithm based on global influence

ZHANG Wenjing, BAN Zhijie   

  1. 1. Inner Mongolia A.R. Key Laboratory of Data Mining and Knowledge Engineering, College of Computer, Inner Mongolia University, Hohhot 010000, China; 2. Hohhot Historical and Cultural City and Intangible Cultural Heritage Protection Center, Hohhot 010000, China
  • Received:2020-06-05 Revised:2020-07-28 Accepted:2020-07-28 Online:2020-08-31 Published:2020-07-28

摘要: 从大量的期刊论文中搜寻出最具有影响力的若干篇论文对于学术研究具有重要意义,但现有影响力最大化算法需要结合贪心算法,时间复杂度较高.依据论文引用网络中引用关系的时间单向性和无环特征,提出一种基于节点全局影响力的影响力最大化算法.该算法主要包括: ①计算所有节点的全局影响力.结合引文网络的发表时间特性,构造上三角稀疏影响方阵.在线性阈值传播模型的基础上,利用节点间的直接、间接路径影响以及累积计算规则模拟影响力在网络上的传播过程.方阵每进行一次运算,会将全部节点的影响向下传播一跳,得到下一个路径的影响,并统计全部影响,最终得到表示所有节点全局影响力的方阵;②将全部节点按全局影响力排序.选择前n个节点作为候选节点来选取k个种子节点,在选取的过程中避免影响力较大节点的聚集情况.以真实的学术引文网络数据集为实验数据,将提出的算法与两种基准算法从激活范围和运行时间两个方面进行对比.实验结果表明,该算法大大降低了时间复杂度,且激活范围接近于贪心算法.

关键词: 引文网络, 社交网络, 影响力最大化, 传播模型

Abstract: It is of great significance for academic researches to search out the most influential papers from a huge number of Journal papers. However, the existing algorithms for maximizing influence need to be combined with greedy algorithm, which increases the time complexity. According to the time unidirectional and acyclic features of the citation relationship in the citation network, an algorithm is proposed to maximize the influence based on the global influence of nodes. The algorithm mainly includes: ①Calculating the global influence of all nodes. Combined with the publication time characteristics of the citation network, the upper triangular sparse influence matrix is constructed. On the basis of the linear threshold propagation model, the direct and indirect path effects between nodes and the cumulative calculation rule are used to simulate the propagation process of influence on the network. Every time the square matrix is calculated, the influence of all nodes will be propagated down one hop to get the influence of the next path, and all the influences will be counted to finally get the square matrix representing the global influence of all nodes; ②All nodes will be ranked according to the global influence, and the first n nodes will be selected as candidate nodes to select k seed nodes. By the cumulative calculation rule, the proposed algorithm avoids the overlapping of influence among nodes during the process of selecting seed nodes. The real academic citation network data set is taken as the experimental sample, and our algorithm is compared with the two benchmark algorithms in terms of activation range and running time. Experimental results show that the proposed algorithm greatly reduces the time complexity, and that the activation range is close to the greedy algorithm.

Key words: citation network, social network, influence maximization, propagation model

中图分类号: