中国科学技术大学学报 ›› 2014, Vol. 44 ›› Issue (1): 1-11.DOI: 10.3969/j.issn.0253-2778.2014.01.001

• 论著 •    

基于状态子集编码的快速DFA构造算法

彭坤杨   

  1. 中国科学技术大学计算机科学与技术学院,安徽合肥 230027
  • 收稿日期:2013-03-26 修回日期:2013-04-23 接受日期:2013-04-23 出版日期:2013-04-23 发布日期:2013-04-23
  • 作者简介:彭坤杨,男,1986年生,博士生. 研究方向:网络与系统安全,高性能计算体系结构. E-mail:pengkuny@mail.ustc.edu.cn
  • 基金资助:
    中国科学技术大学博士研究生学术新人项目资助.

A fast DFA construction algorithm by subset encoding

PENG Kunyang   

  1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
  • Received:2013-03-26 Revised:2013-04-23 Accepted:2013-04-23 Online:2013-04-23 Published:2013-04-23

摘要: 网络深度包检测等网络应用广泛采用正则表达式匹配技术检测网络中的传输内容,正则表达式用非确定性有限自动机(NFA)或者确定性有限自动机(DFA)实现.网络应用对匹配速度要求很高,相比NFA,DFA具有确定性的匹配速度,但所有基于DFA的方法需要预先从NFA构造一个与之等价的DFA,于是DFA的构造成为系统瓶颈之一.为此通过深入探索自动机内在运行特性——NFA状态间活跃关系和NFA中导致DFA空间膨胀的因素,设计了一种NFA状态子集的编码方法和查询方法,显著减少了DFA构造过程中状态子集的查询代价.基于入侵检测与防护系统Snort中的真实规则集的实验表明,与传统的子集构造算法相比,该方法减少了8833%~9357%的DFA构造时间.

关键词: NFA, DFA, 正则表达式匹配, 深度包检测

Abstract: Regular expression matching is the foundation of many network functions such as deep packet inspection, which is performed using either non-deterministic finite automaton (NFA) or deterministic finite automation (DFA). To meet the requirement of high-speed regular expression matching, DFA has been the prevalent choice for its deterministic nature and matching efficiency. However, all these DFA-based approaches need to construct a DFA from an NFA as an intermediate step, thus the DFA construction process can be one of bottlenecks for the system. By exploring the inherent properties of finite automaton — whether the NFA states simultaneously active and how the self-looping NFA states lead to explosion of DFA state space — a state subset encoding and searching scheme was designed, and a new DFA construction algorithm was proposed. Through experiments based on real life pattern sets from the Snort intrusion detection system, the new DFA construction algorithm was demonstrated to reduce the running time by 8833%~9357% compared with that of the standard subset construction algorithm.

Key words: NFA, DFA, regular expression matching, deep packet inspection

中图分类号: