Journal of University of Science and Technology of China ›› 2020, Vol. 50 ›› Issue (7): 1019-1025.DOI: 10.3969/j.issn.0253-2778.2020.07.020

• Original Paper • Previous Articles     Next Articles

A malicious domain name detection method based on CNN

DU Shuying, DU Peng, DING Shifei   

  1. 1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China; 2. School of information management , Xuzhou Vocational College of Bioengineering, Xuzhou 221000, China
  • Received:2020-06-03 Revised:2020-06-21 Accepted:2020-06-21 Online:2020-07-31 Published:2020-06-21

Abstract: In recent years, various cyber attacks based on botnets have been one of the cyber security threats. Various malwares use the Domain Generation Algorithm (DGA) to automatically generate a large number of pseudo-random domain names to connect to commands and control servers. The detection and classification of pseudo-random domain names based on the convolutional neural network (CNN) method is focused on. A brief introduction is given to the hazards, basic principles of botnets, and the role of fake domain names in botnets. After analyzing the principle of DGA algorithm and the defects of traditional DGA domain name recognition algorithm,emphasis is laid on the research of fake domain name recognition method based on convolutional neural network. The basic concept of convolutional neural network is expounded by simple neural network training experiments. The differences of the model's effect on solving classification problems under different hyperparameters and different excitation functions are simulated. In the analysis of the model operation results, the accuracy and loss function of the domain name identification by the convolutional neural network model are given, and the evaluation indexes of the accuracy, recall, F1 and ROC curves are printed out. All indicators show that the classification of the model is good. It is concluded that counterfeit domain name recognition based on CNN is a reliable method.

Key words: domain generation algorithm (DGA), word embedding, deep learning, convolutional neural network (CNN)

CLC Number: