1. 中国矿业大学(北京) 机电与信息工程学院,北京 100083
2. 中国矿业大学(北京) 网络与信息中心,北京 100083
3. 中国科学院 信息工程研究所,北京 100084
4. 中国科学院大学 网络空间安全学院,北京 100049
[ "霍跃华(1981—),男,高级工程师,E-mail:[email protected];" ]
[ "吴文昊(1997—),男,中国矿业大学(北京)硕士研究生,E-mail:[email protected];" ]
[ "赵法起(1997—),男,中国矿业大学(北京)硕士研究生,E-mail:[email protected];" ]
[ "王强(1997—),男,中国科学院信息工程研究所博士研究生,E-mail:[email protected]" ]
扫 描 看 全 文
霍跃华, 吴文昊, 赵法起, 等. 结合协同训练的多视图加密恶意流量检测方法[J]. 西安电子科技大学学报, 2023,50(4):139-147.
霍跃华, 吴文昊, 赵法起, 等. 结合协同训练的多视图加密恶意流量检测方法[J]. 西安电子科技大学学报, 2023,50(4):139-147. DOI: 10.19665/j.issn1001-2400.2023.04.014.
针对基于机器学习的传输层安全协议加密恶意流量检测方法对标注样本依赖度高的问题,提出了一种基于半监督学习的传输层安全协议加密恶意流量检测方法。在少量标注样本的情况下,利用协同训练策略协同加密流量的两个视图,通过引入无标注样本进行训练,扩大样本集,进而减少对标注样本的依赖。首先,提取加密流量特征中独立性强的流元数据特征和证书特征,并分别构建协同训练的两个视图。其次,针对两个视图分别构建XGBoost分类器和随机森林分类器。最后,通过协同训练策略协同两个分类器构成多视图协同训练分类器检测模型,利用小规模标注样本和大量无标注样本进行模型训练。在公开数据集上,模型准确率达到了99.17%,召回率达到了98.54%,误报率低于0.18%。实验结果表明,在小规模标注样本的条件下,能够有效降低对标注样本依赖度。
Aiming at the problem of high dependence on labeled samples in machine learning-based malicious traffic detection methods for transport layer security protocol encryption,a semi-supervised learning-based malicious traffic detection method for transport layer security protocol encryption is proposed.With only a small number of labeled samples,the co-training strategy is utilized for the first time to joint two views of the encrypted traffic,and the training is performed by introducing unlabeled samples to expand the sample set and thereby to reduce the dependence on labeled samples.First,the flow metadata features with strong independence and certificate features in encrypted traffic features are extracted to construct each view for collaborative training,respectively.Second,the XGBoost classifier and random forest classifier are constructed for each view respectively.Finally,the two classifiers are collaboratively trained to form a multi-view co-training classifier detection model through the co-training strategy,with the model trained using a small number of labeled samples and a large number of unlabeled samples.The model achieves an accuracy rate of 99.17%,a recall rate of 98.54%,and a false positive rate of less than 0.18% on the public dataset.Experimental results show that the proposed method can effectively reduce the dependence on labeled samples under the condition of a small number of labeled samples.
协同训练传输层安全协议多视图特征选择半监督学习
co-trainingtransport layer securitymulti-viewfeature selectionsemi-supervised learning
谭豪, 申兵, 苗旭东, 等. Gimli认证加密方案的不可能差分分析[J]. 西安电子科技大学学报, 2022, 49(5):213-220.
TAN Hao, SHEN Bing, MIAO Xudong, et al. Impossible Differential Cryptanalysis of the Gimli Authenticated Encryption Scheme[J]. Journal of Xidian University, 2022, 49(5):213-220.
刘亚, 宫佳欣, 赵逢禹. 加密算法Simpira v2的不可能差分攻击[J]. 西安电子科技大学学报, 2022, 49(5):201-212.
LIU Ya, GONG Jiaxin, ZHAO Fengyu. Impossible Differential Attack on the Encryption Algorithm Simpira v2[J]. Journal of Xidian University, 2022, 49(5):201-212.
Google. Transparencyreport (2022)[EB/OL].[2022-09-24]. https://transparencyreport.google.com/https/overview. https://transparencyreport.google.com/https/overviewhttps://transparencyreport.google.com/https/overview
鲁刚, 郭荣华, 周颖, 等. 恶意流量特征提取综述[J]. 信息网络安全, 2018, 2018(9):1-9.
LU Gang, GUO Ronghua, ZHOU Ying, et al. Review of Malicious Traffic Feature Extraction[J]. Netinfo Security, 2018, 2018(9):1-9.
GALLAGHER S. Nearly Half of Malware Now Use TLS to Conceal Communications (2022)[EB/OL].[2022-09-24]. https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware-now-use-tls-to-conceal-communications/. https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware-now-use-tls-to-conceal-communications/https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware-now-use-tls-to-conceal-communications/
WANG Q, LI W, BAO H, et al. High-Efficient and Few-Shot Adaptive Encrypted Traffic Classification with Deep Tree[C]// MILCOM 2022-2022 IEEE Military Communications Conference (MILCOM).Piscataway:IEEE, 2022:458-463.
FANG Y, XU Y, HUANG C, et al. Against Malicious SSL/TLS Encryption:Identify Malicious Traffic Based on Random Forest[C]// Fourth International Congress on Information and Communication Technology.Berlin:Springer, 2020:99-115.
康鹏, 杨文忠, 马红桥. TLS协议恶意加密流量识别研究综述[J]. 计算机工程与应用, 2022, 58(12) :1-11. DOI:10.3778/j.issn.1002-8331.2110-0029http://doi.org/10.3778/j.issn.1002-8331.2110-0029
KANG Peng, YANG Wenzhong, MA Hongqiao. TLS Malicious Encrypted Traffic Identification Research[J]. Computer Engineering and Applications, 2022, 58(12):1-11. DOI:10.3778/j.issn.1002-8331.2110-0029http://doi.org/10.3778/j.issn.1002-8331.2110-0029
LI W, ZHANG X Y, BAO H, et al. Robust Network Traffic Identification with Graph Matching[J]. Computer Networks, 2022, 218:109368. DOI:10.1016/j.comnet.2022.109368http://doi.org/10.1016/j.comnet.2022.109368https://linkinghub.elsevier.com/retrieve/pii/S1389128622004029https://linkinghub.elsevier.com/retrieve/pii/S1389128622004029
LI W, ZHANG X Y, BAO H, et al. ProGraph:Robust Network Traffic Identification with Graph Propagation[J]. IEEE/ACM Transactions on Networking, 2022:1-15.
曾勇, 吴正远, 董丽华, 等. 加密流量中的恶意流量识别技术[J]. 西安电子科技大学学报, 2021, 48(3):170-187.
ZENG Yong, WU Zhengyuan, DONG Lihua, et al. Research on Malicious Traffic Identification Technology in Encrypted Traffic[J]. Journal of Xidian University, 2021, 48(3):170-187.
KESHKEH K, JANTAN A, ALIEYAN K, et al. A Review on TLS Encryption Malware Detection:TLS Features,Machine Learning Usage,and Future Directions[C]// International Conference on Advances in Cyber Security.Berlin:Springer, 2021:213-229.
邹洁, 朱国胜, 祁小云, 等. 基于C4.5决策树的HTTPS加密流量分类方法[J]. 计算机科学, 2020, 47(S1):381-385.
ZOU Jie, ZHU Guosheng, QI Xiaoyun, et al. HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree[J]. Computer Science, 2020, 47(S1):381-385.
TORROLEDO I, CAMACHO L D, BAHNSEN A C. Hunting Malicious TLS Certificates with Deep Neural Networks[C]// Proceedings of the 11th ACM workshop on Artificial Intelligence and Security. New York: ACM, 2018:64-73.
YU B, FANG Y, YANG Q, et al. A Survey of Malware Behavior Description and Analysis[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(5):583-603.
HUO Y H, ZHAO F Q, ZHANG H S, et al. AS-DMF:A Lightweight Malware Encrypted Traffic Detection Method Based on Active Learning and Feature Selection[J]. Wireless Communications and Mobile Computing, 2022:1-14.
VAN ENGELEN J E, HOOS H H. A Survey on Semi-Supervised Learning[J]. Machine Learning, 2020, 109(2):373-440. DOI:10.1007/s10994-019-05855-6http://doi.org/10.1007/s10994-019-05855-6
卢宛芝, 丁要军. 基于半监督多视图特征协同训练的网络恶意流量识别方法[J]. 通信技术, 2022, 55(4):513-518.
LU Wanzhi, DING Yaojun. Network Malicious Traffic Identification Method Based on Semi-supervised Muiti-View Feature Co-Training[J]. Communication Technology, 2022, 55(4):513-518.
ABDELGAYED T S, MORSI W G, SIDHU T S. Fault Detection and Classification Based on Co-Training of Semisupervised Machine Learning[J]. IEEE Transactions on Industrial Electronics, 2017, 65(2):1595-1605. DOI:10.1109/TIE.41http://doi.org/10.1109/TIE.41https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=41https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=41
ILIYASU A S, DENG H. Semi-Supervised Encrypted Traffic Classification with Deep Convolutional Generative Adversarial Networks[J]. IEEE Access, 2019, 8:118-216. DOI:10.1109/Access.6287639http://doi.org/10.1109/Access.6287639https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639
霍跃华, 赵法起, 吴文昊. 多特征融合的煤矿网络加密恶意流量检测方法[J]. 工矿自动化, 2022, 48(7):142-148.
HUO Yuehua, ZHAO Faqi, WU Wenhao. Multi-Feature Fusion Based Encrypted Malicious Traffic Detection Method for Coal Mine Network[J]. Journal of Mine Automation, 2022 48(7):142-148.
PAXSON V. Bro:A System for Detecting Network Intruders in Real-Time[J]. Computer networks, 1999, 31(23-24):2435-2463. DOI:10.1016/S1389-1286(99)00112-7http://doi.org/10.1016/S1389-1286(99)00112-7https://linkinghub.elsevier.com/retrieve/pii/S1389128699001127https://linkinghub.elsevier.com/retrieve/pii/S1389128699001127
ANDERSON B, PAUL S, MCGREW D. Deciphering Malware’s Use of TLS (without Decryption)[J]. Journal of Computer Virology and Hacking Techniques, 2018, 14(3):195-211. DOI:10.1007/s11416-017-0306-6http://doi.org/10.1007/s11416-017-0306-6
霍跃华, 赵法起. 基于stacking和多特征融合的加密恶意流量检测研究(2022)[J/OL].[2022-09-24].https://doi.org/10.19678/j.issn.1000-3428.0064805https://doi.org/10.19678/j.issn.1000-3428.0064805. https://doi.org/10.19678/j.issn.1000-3428.0064805https://doi.org/10.19678/j.issn.1000-3428.0064805
HUO Yuehua, ZHAO Faqi. Analysis of Encrypted Malicious Traffic Detection Based on Stacking and Muti-Feature Fusion (2022)[J/OL].[2022-09-24].https://doi.org/10.19678/j.issn.1000-3428.0064805https://doi.org/10.19678/j.issn.1000-3428.0064805. https://doi.org/10.19678/j.issn.1000-3428.0064805https://doi.org/10.19678/j.issn.1000-3428.0064805
YU T, ZOU F, LI L, et al. An Encrypted Malicious Traffic Detection System Based on Neural Network[C]// 2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC).Piscataway:IEEE, 2019:62-70.
黄欣辰, 皋军, 黄豪杰. 基于PCA降维的成对约束半监督聚类集成[J]. 计算机与现代化, 2021, 2021(1):94-99.
HUANG Xinchen, GAO Jun, HUANG Haojie. Semi-Supervised Clustering Ensemble with Pairwise Constraints Based on PCA Demension Reduction[J]. Computer and Modernization, 2021, 2021(1):94-99.
0
浏览量
3
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构